Maxwell : Installing python modules and kernels

you have essentially 4 options to install python modules. It doesn't matter really which python version (conda, plain vanilla, ...) or python environment (venv, conda, ...) you use.

1: Install python modules in system path

python3 -m pip install something

requires privileges to install modules into the python tree. It's not an option for any of the "central" python installations available on maxwell. It's however no problem if you have your own python installation (for example a mambaforge installation) in /<where-ever>/my-python since you're the owner of the installation.

/<where-ever>/my-python/bin/python3 -m pip install something

will usually install into /<where-ever>/my-python/lib/python3.<x>/site-packages. 

Executing /<where-ever>/my-python/bin/python3 will ignore any of the python modules available in central python installations.

2: Using a conda or virtual environment.

Lets assume you create a venv, e.g.

module load maxwell conda 
which python3
   /software/mamba/2022.06/bin/python3
python3 -m venv test-venv
source test-venv/bin/activate
which python3
   ~/test-venv/bin/python3

From here on it's like for option 1, python module from other python installations are largely "ignored". For example

python3 -m pip list
   Package    Version
   ---------- -------
   pip        22.0.4
   setuptools 58.1.0

The "base modules" are however inherited from the python installation (sys.path will show).

python3 -m pip install numpy

will now install into ~/test-venv/bin/lib/python3.<x>/site-packages. Conda/mamba environments work essentially the same way.

That's the best option to install and maintain your own python environments and kernels. To use kernels from jupyter the environments you need to have ipykernel installed (in the environment), and you need to supply a kernel definition (see below), but that's pretty much it.

The environments are in a sense "global": you can use them from any node in the cluster. They are also "local": only you and anyone with read-permission on the installation folder can use them.

3: Installing in user space

python3 -m pip install something --user --upgrade

installs into ~/.local/lib/python3.<x>/site-packages. It's actually "global" and applies to any python installation or environment using the same python version. It always takes precedence (except for PYTHONPATH, see below). If you have for example a python3.9 environment with numpy=1.17.1 (e.g. as a requirement) and a numpy=1.24.3 installed in  ~/.local/lib/python3.9/site-packages, python3.9 will use the latter one regardless of the requirements of the environment. 

I would try to avoid --user module installations. You can however disable usage of ~/.local/ python modules by setting

export PYTHONNOUSERSITE=1

4: Install python modules in arbitrary locations

python3 -m pip install something --prefix=somewhere

allows to install individual python modules in arbitrary locations. python won't be able to find the module unless you set the PYTHONPATH or add the path to sys.path. PYTHONPATH is usually not a good idea since it applies to any python version and takes precedence over (almost) everything else. module load mpi/openmpi-x86_64 adds /usr/lib64/python2.7/site-packages/openmpi to PYTHONPATH, which fails for any python3.

Python kernel

Lets assume you have a python environment, with ipykernel installed. Activating the environment "activates" /somewhere/env/bin/python3. You can then create a python kernel using ipykernel:

/somewhere/env/bin/python3 -m ipykernel install --name superapp --display-name "SuperApp (py3.11)" --user
   Installed kernelspec superapp in ~/.local/share/jupyter/kernels/superapp

That creates a kernel-folder ~/.local/share/jupyter/kernels/superapp/kernel.json and some default icons, which you can easily replace by icons of your choice.

The kernel file looks like this:

~/.local/share/jupyter/kernels/superapp/kernel.json:
{
 "argv": [
  "/somewhere/env/bin/python3",
  "-m",
  "ipykernel_launcher",
  "-f",
  "{connection_file}"
 ],
 "display_name": "SuperApp (py3.11)",
 "language": "python",
 "metadata": {
  "debugger": true
 }
}

So it basically just tells the jupyter server which python to use and to execute /somewhere/env/bin/python3 -m ipykernel_launcher. Everything else is sorted out by jupyter at runtime. 

Some kernels need additional information, tensorflow for example needs 

~/.local/share/jupyter/kernels/mytf/kernel.json:
{
 "argv": [
  "/somewhere/env/bin/python3",
  "-m",
  "ipykernel_launcher",
  "-f",
  "{connection_file}"
 ],
 "display_name": "MY TF",
 "language": "python",
 "metadata": {
  "debugger": true
 },
 "env": {"LD_LIBRARY_PATH":"/software/cuda/cuda-11.2/lib64/","TF_CPP_MIN_LOG_LEVEL":"3"}
}

or you could set PYTHONNOUSERSITE to ensure that users own python modules won't interfere:


~/.local/share/jupyter/kernels/safeapp/kernel.json:
{
 "argv": [
  "/somewhere/env/bin/python3",
  "-m",
  "ipykernel_launcher",
  "-f",
  "{connection_file}"
 ],
 "display_name": "MY Save APP",
 "language": "python",
 "metadata": {
  "debugger": true
 },
 "env": {"PYTHONNOUSERSITE":"1"}
}

Avoiding version mismatches

python virtual or conda environments are largely self-consistent. PYTHONPATH and USERSITE package can break consistency, so avoiding PYTHONPATH and setting PYTHONNOUSERSITE are very helpful to keep environments consistent.

When working with custom kernels in jupyterhub you need to keep some packages consistent between your own virtual environment and the setup used by jupyter.

The conda setup of the jupyterhub uses for example

@max-wgse002:~$ conda list | egrep 'ipywidgets|traitlets'
ipywidgets                7.7.0              pyhd8ed1ab_0    conda-forge
traitlets                 5.2.2.post1        pyhd8ed1ab_0    conda-forge

If your environment contains for example ipywidgets=8.0.0 and/or traitlets=5.8.0 all the widgets will be broken, either failing silently or with errors like „Error displaying Widget: model not found“. In venv environments you can try to keep packages consistent by using

python3 -m venv test-venv --system-site-packages

This way all "system-packages" are known in the virtual environment and e.g. pip install ipywidgets will not alter the package version. The good thing is, that updates in the "base" environment will automatically be part of the virtual environment. Be aware however, that pip install -U ipywidgets will still alter versions, and possibly break things, and the same might happen if a python package requires a different ipywidgets version. 

In conda environments you can simply pin the versions with e.g. ipywidgets=7.7.0 traitlets=5.2.2.post1. It won't help if the packages get updated in the base environment, but that's something we do only if absolutely unavoidable.



Additional information