Maxwell : Conda/Mamba Python

The use of anaconda®  is deprecated. Please use mamba instead. Do NOT use the anaconda or miniconda installer, use miniforge or mambaforge instead.

For existing installations or environments please see the Python page for instructions how to disable the non-free anaconda channel.

We strongly recommend replacing all anaconda/miniconda installations by miniforge or mambaforge.


Summary

Source:mamba https://github.com/mamba-org/mamba

Source: miniforge/mambaforge https://github.com/conda-forge/miniforge 

License3-clause BSD

Path: /software/mamba

Documentation: https://docs.conda.io/projects/conda/en/latest/


conda/mamba is a package manager that offers an easy way to perform Python/R data science and machine learning.

Working with conda/mamba on Maxwell

we offer basic conda installations together with a number of environments, kernels, labextensions. The conda installation serves as the base environment for the jupyter-hub which might limit options to upgrade to newest versions. You can however easily install your own conda version, and your group/institute like Eu.XFEL, CSSB, etc most likely has separate conda installations tailored for their specific applications. When working with conda, be aware that

The simplest option to setup conda python is

module load maxwell mamba  # or use module load maxwell conda which does exactly the same thing
. mamba-init
# afterwards you can use mamba or conda. for clarity we recommend using mamba.
  • module load maxwell mamba initializes python=3.9

  • . mamba-init does the same as the "mamba init" block in your login environment, but without side-effects

  • if you encounter permission problems, please try (once) (and you might want to choose a different location if your home gets too tight):
cat <<eof >> ~/.condarc
pkgs_dirs:
    - /home/$USER/.conda/pkgs
eof

Note: the latest version of the conda/3.9 module automatically generates .condarc if none is present!

Available environments

# list available environments
mamba env list  # conda env list will work as well
# conda environments:
#
base                  *  /software/mamba/2022.06
rapids-22.04             /software/mamba/2022.06/envs/rapids-22.04

# activate an environment
mamba activate rapids-22.04

Using conda environments

conda environments allow to install python versions and packages in a self-contained way. For example

module load maxwell conda/3.9
# you can replace mamba by conda in the following steps .... 
. mamba-init
mamba create -n hexrd python=3.8
mamba activate hexrd
mamba install -c hexrd -c conda-forge hexrd

That will produce an environment (located in ~/.conda/envs) containing hexrd and all dependencies.

You can create a kernel to be used in the jupyterhub from an environment:

module load maxwell conda/3.9
. mamba-init
mamba activate hexrd
mamba install ipykernel -c conda-forge
python -m ipykernel install --user --name=hexrd

kernel definitions are very simple json files. If the creation of the kernel using ipython fails for any reason, you can create one manually. For example

mkdir -p ~/.local/share/jupyter/kernels/hexrd

cat <<eof> ~/.local/share/jupyter/kernels/hexrd/kernel.json
{
 "argv": [
  "/home/<username>/.conda/envs/hexrd/bin/python",
  "-m",
  "ipykernel_launcher",
  "-f",
  "{connection_file}"
 ],
 "display_name": "tf-gpu-2.4",
 "language": "python",
 "metadata": {
  "debugger": true
 }
eof

Conda environments in BeeGFS

As mentioned, conda can quickly consume your 30GB quota of your home directory. You can install conda environments in other locations like BeeGFS, but it's not entirely free of problems. The BeeGFS setup is simply lacking the hardware to cope with myriads of metadata-requests (e.g. everything doing a "stat" on a file or directory like ls -lR is very expensive). conda unfortunately touches a huge number of files, and spawning multiple processes magnifies the problem. So conda environments might become quite slow. Preparing singularity images with conda environments might be a good alternative.

Be aware: if you install conda-environments in BeeGFS the conda pkgs also have to reside in BeeGFS! Mixing GPFS and BeeGFS will inevitably result in broken environments. There is however a simple way around the problem. For example

  • create ~/.condarc to use your home-directory for environment installations
  • create ~/.condarc.beegfs to use BeeGFS for environment installations
.condarc
auto_activate_base: false
channels:
  - conda-forge
channel_priority: disabled
pkgs_dirs:
  - ~/.conda/pkgs

envs_dirs:
  - ~/.conda/envs

.condarc.beegfs
auto_activate_base: false
channels:
  - conda-forge
channel_priority: disabled
pkgs_dirs:
  - /beegfs/desy/user/<username>/.conda/pkgs

envs_dirs:
  - /beegfs/desy/user/<username>/.conda/envs

You can than switch between environments in $HOME and in BeeGFS:

# install environments in BeeGFS
export CONDARC=~/.condarc.beegfs
mamba create -n env-in-beegfs python=3.10 
mamba activate env-in-beegfs
[...]

# install environments in $HOME
unset CONDARC
mamba create -n env-in-home python=3.10 
mamba activate env-in-home
[...]


module load maxwell mamba/3.9
. mamba-init

mk-beegfs # create your beegfs folder if you don't have one yet
mkdir -p /tmp/$USER/spack-stage/ # the mamba module will usually do this for you 

export CONDARC=~/.condarc.beegfs 
mamba create -n hexrd python=3.8
mamba activate hexrd
mamba install -c hexrd -c conda-forge hexrd
# create a jupyter kernel
mamba install ipykernel -c conda-forge
python -m ipykernel install --user --name=hexrd

# The environment now resides in /beegfs/desy/user/<username>/.conda/envs/hexrd
# In most cases you don't need to activate the environment to run the code, but just set the PATH:
export PATH=/beegfs/desy/user/<username>/.conda/envs/hexrd/bin:$PATH

Moving existing conda environments to BeeGFS

It's not possible to simply move a conda environment from GPFS (e.g. $HOME) to BeeGFS. It's however relatively easy to clone a conda environment. Lets assume you've created ~/.condarc.beegfs as oulined above:

export CONDARC=~/.condarc.beegfs
module load maxwell mamba/3.9
. mamba-init

mamba create --prefix /beegfs/desy/user/$USER/.conda/envs/my-new-env --clone /home/$USER/.conda/envs/my-old-env

# activate the environment
mamba activate my-new-env  # should work as long as CONDARC is set. If it fails:
mamba activate  /beegfs/desy/user/$USER/.conda/envs/my-new-env 

# alternatively just set the PATH, works in most cases:
export PATH= /beegfs/desy/user/$USER/.conda/envs/my-new-env/bin:$PATH


Adding packages globally (for your account)

pip is usually the easiest to install packages for all your python environments, but be aware that this can quickly lead to inconsistencies:

module load maxwell mamba/3.9
. conda-init
python3 -m pip install --user --upgrade numpy
# it works exactly the same way when working with the system python 3.6. 

Note:

  • packages will install in ~/.local/bin and ~/.local/lib/python3.9/site-packages
  • you will need to add ~/.local/bin to your PATH or use a full path to execute commands installed there
  • ~/.local takes precedence over packages installed in any environment. It hence can easily break dependencies. conda or virtual environments are the better choice.

Making your own mamba installation

https://github.com/conda-forge/miniforge offers lightweight installer which will exclusively use the conda-forge channel. A simple installation instruction could look like this

# fetch installer
wget https://github.com/conda-forge/miniforge/releases/latest/download/Mambaforge-Linux-x86_64.sh

# silent install, use -h for help
PREFIX=/beegfs/desy/user/$USER/minitest
/bin/bash Mambaforge-Linux-x86_64.sh -b -s -p $PREFIX

# setup PATH
export PATH=$PREFIX/bin:$PATH




# create mamba-init
cat <<eof > $PREFIX/bin/mamba-init
# >>> conda initialize >>>
# !! Contents within this block are managed by 'conda init' !!
__conda_setup="\$('$PREFIX/bin/conda' 'shell.bash' 'hook' 2> /dev/null)"
if [ \$? -eq 0 ]; then
    eval "\$__conda_setup"
else
    if [ -f "$PREFIX/etc/profile.d/conda.sh" ]; then
        . "$PREFIX/etc/profile.d/conda.sh"
    else
        export PATH="$PREFIX/bin:\$PATH"
    fi
fi
unset __conda_setup

if [ -f "$PREFIX/etc/profile.d/mamba.sh" ]; then
    . "$PREFIX/etc/profile.d/mamba.sh"
fi
# <<< conda initialize <<<
eof

# Note: if you just use an editor to create the file, replace \$ by $!
# Note: if you don't have write permission in - or don't want to modify - the mamba installation, you need to define the pkgs-directory (also see above), e.g.
cat <<eof >> ~/.condarc
auto_activate_base: false
pkgs_dirs:
    - /home/$USER/.conda/pkgs
eof

Now install mamba and continue with package installations and environments:

. mamba-init

mamba install -y numpy scipy matplotlib  

# create a python 3.7 test environment:
mkdir -p /tmp/$USER/spack-stage/ 
mamba create -n py37 python=3.7 

# use the environment:
mamba env list
# conda environments:
#
base                  *  /beegfs/desy/user/schluenz/minitest
py37                     /beegfs/desy/user/schluenz/minitest/envs/py37

mamba activate py37
mamba list # installed packages
mamba install numpy ... # add packages to py37 environment


Note: when creating environments in beegfs, the packages also have to be in beegfs! There are basically two ways to achieve that:

# Option 1. 
cd 
rm -rf ~/.conda                                  # removes all your conda stuff! 
mkdir -p /beegfs/desy/user/$USER/.conda       
ln -s /beegfs/desy/user/$USER/.conda .

mamba create -n my-conda-env python=3.8

# This way .conda, pkgs-dir and environments will all reside in beegfs. No need to use prefixes. 

# ---------------------------------------------------------------------------------------------- #

# Option 2. 
mkdir -p /beegfs/desy/user/$USER/.conda/pkgs
conda config --add pkgs_dirs /beegfs/desy/user/$USER/.conda/pkgs

mamba create --prefix=/beegfs/desy/user/$USER/my-conda-env python=3.8

# This will place packages and environment into beegfs, but will leave for example environments.txt in ~/.conda/.