Maxwell : conda environment in singularity/apptainer

conda python touches lots of files, operations which are quite expensive on cluster filesystems, and can lead to significant performance issues when conda environments are located in BeeGFS. The same problem also occurs with GPFS, but the effect is usually much smaller. Using singularity can mitigate the problem, since a singularity image is just a single file from the perspective of a filesystem. This page provides a build template.

Building a containerized conda environment from scratch

You first need a file with the build instructions, lets say test.def

Bootstrap: shub
From: singularityhub/centos

%help
Singularity Container for a new conda environment

%apprun python
   exec /opt/micromamba/bin/python


%labels
  org.label-schema.version  0.0.1
  org.label-schema.url      none
  org.label-schema.name     my-new-
  org.label-schema.vendor   myself
  Version  0.0.1
  Author   myself

%environment
export PATH=/opt/micromamba/bin:$PATH  

%post 
yum install tar curl bzip2 developer-tools -y
mkdir -p /opt/micromamba/bin && cd /opt/micromamba
curl -Ls https://micro.mamba.pm/api/micromamba/linux-64/latest | tar -xvj bin/micromamba
export MAMBA_ROOT_PREFIX=/opt/micromamba
eval "$(./bin/micromamba shell hook -s posix)"
 
micromamba activate
micromamba install python=3.10 -c conda-forge
micromamba install numpy scipy scikit-learn ipykernel -c conda-forge

With the build-instruction you can build a singularity image:

singularity build --fakeroot test.simg test.def

Note: this will not work on Maxwell right away. You need to send an e-mail to maxwell.service@desy.de so we know that you intend to build singularity images. The fakeroot option requires subordinate settings (see for example https://freeipa.readthedocs.io/en/latest/designs/subordinate-ids.html for more details on this topic).

Running the image

Once you have the image you can execute the embedded python:

singularity run --bind /asap3:/asap3,/beegfs:/beegfs --app python /beegfs/desy/user/$USER/Singularity/test.simg

That will bring up the python prompt. To actually execute some python code:

singularity exec --bind /asap3:/asap3,/beegfs:/beegfs /beegfs/desy/user/$USER/Singularity/test.simg /opt/micromamba/bin/python -m pip list

Running the image as a jupyter kernel

Using the image in jupyter requires a slightly modified kernel file. For example create a folder ~/.local/share/jupyter/kernels/singsing/ containing

ls -1 ~/.local/share/jupyter/kernels/singsing/
kernel.json
logo-32x32.png
logo-64x64.png
logo-svg.svg

The logos appear in the jupyterlab dashboard. If you don't care just copy them from some other python kernel. The crucial bit is the kernel.json file which instructs jupyter which application to launch:

{
 "argv": [
  "/usr/bin/singularity", "exec", "--bind", "/asap3:/asap3,/beegfs:/beegfs", "/beegfs/desy/user/schluenz/Singularity/test.simg", 
  "/opt/micromamba/bin/python",
  "-m",
  "ipykernel_launcher",
  "-f",
  "{connection_file}"
 ],
 "display_name": "SingSing",
 "language": "python",
 "metadata": {
  "debugger": false
 }
}

Once you have done so, you should see a kernel named SingSing (or whatever name you choose) in the jupyter dashboard. The bind instruction tell singularity which filesystems should be available inside the image. The home-dir and some other system-directories are always mounted.

Cloning an existing environment into asingularity image

The setup is almost identical to the one above. You need however to create a yml-file from the existing enviroment. For example to clone the tomopy environment:

conda activate tomopy          
conda env export | grep -v '^name:' > tomopy.yml

The installation procedure needs to modified very slightly so that the build process uses the yml-file:

Bootstrap: shub
From: singularityhub/centos
 
%help
Singularity Container for a new conda environment
 
%apprun python
   exec /opt/micromamba/bin/python

%labels
  org.label-schema.version  0.0.1
  org.label-schema.url      none
  org.label-schema.name     my-new-
  org.label-schema.vendor   myself
  Version  0.0.1
  Author   myself
 
%environment
export PATH=/opt/micromamba/bin:$PATH 

%files 
  tomopy.yml /
 
%post
yum install tar curl bzip2 developer-tools -y
mkdir -p /opt/micromamba/bin && cd /opt/micromamba
curl -Ls https://micro.mamba.pm/api/micromamba/linux-64/latest | tar -xvj bin/micromamba
export MAMBA_ROOT_PREFIX=/opt/micromamba
eval "$(./bin/micromamba shell hook -s posix)"
 
micromamba activate
micromamba install -f /tomopy.yml -c conda-forge

From here on everything is exactly as outline above.