Page tree

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 35 Next »

Software & Applications

The Maxwell-Cluster is a resources dedicated to parallel and multi-threaded application, which can use at least some of the specific characteristics. In addition to serving as a medium scale High-Performance-Cluster, Maxwell incorporates resources for Photon Science data analysis, resources of CFEL, CSSB, Petra4, the European XFEL...

If you find the resource useful for your work, we would greatly appreciate to learn about publications, which have been substantially benefiting from the Maxwell-Cluster. Drop us a mail at Acknowledgement of the maxwell-resource would also be greatly appreciated. It'll help to foster the cluster, for example: "This research was supported in part through the Maxwell computational resources operated at Deutsches Elektronen-Synchrotron (DESY), Hamburg, Germany"

Search the compute space

max-display3 nodes (max-display004,5) are currently being upgraded to Centos 8, serving as a testbed for future upgrades of the cluster.

  • Currently is not available for any FastX sessions, we are working on that
  • Expect many (even basic) things NOT to work

The changes in favor of Petra4-computing tasks announced Aug. 17th have been reverted, and new nodes and partitions have been added accordingly:

  • The maxwell partition has been restored with all nodes and a maximum job runtime of 7 days.
  • The petra4 partition has been increased by 40 new AMD EPYC-7402 nodes
  • A new short partition has been created.
    • The short partition also contains the 40 new AMD EPYC-7402 nodes
    • The maximum job runtime is 4 hours
    • jobs in the petra4 partition are prioritized. Jobs in the short partition might be delayed but will never be terminated (preempted).

A short overview of the relevant changes:


# of nodesNodes/JobMax # of JobsDefault TimeMaximum TimeAllowed Groups
petra466no limitno limit1:00:0014-00:00:00max-petra4-sim-users

For details of available hardware consult the hardware pages

Some useful commands to get more information about  the current setup:

/usr/local/bin/max-limits -a                       # show partitions and the limits applying 
/usr/local/bin/max-limits                          # show only partitions allowed
/usr/local/bin/my-partitions                       # list partitions indicating which ones can be used and which ones not

/usr/bin/sinfo                                     # show available nodes and partitions
/usr/bin/sinfo -p short -o '%20n %20f %10t %c %m'  # show nodes in the short partition, with state, features, cores...

/software/tools/bin/savail -p maxgpu               # show detailed information about available nodes taking into account preemptable jobs 


you will find a new version of octave on maxwell

[sternber@max-wgs001]~% module load maxwell octave/5.2.0
[sternber@max-wgs001]~% octave --version
GNU Octave, version 5.2.0

For further information about ocate and the new version

Dear colleagues,

the DESY directorate has decided to temporarily shift compute priorities on the Maxwell cluster in favor of urgent Petra4 computations.
As a consequence we have to make temporary adjustments to the maxwell partition in the following way:

- Starting Wednesday August 19th the maximum time-limit of jobs in the maxwell partition will be reduced to _4_ HOURS.
- Nodes in the maxwell partition will also become part of the petra4 partition, and will be prioritized in the petra4 partition.

What happens to your jobs after the change?
- Jobs already running in the maxwell partition (or any other partition) will not be affected.
- Jobs with a runtime of more than 4 hours and still waiting in the maxwell partition have to be removed after deploying the configuration changes next Wednesday. The jobs would never execute.
- Jobs submitted to the maxwell partition with a proper time-limit of 4h or less run unaffected. Due to the prioritization of petra4 you might however experience long queuing times. Please consider using the all-partition as well as other resources possibly available to you and your group.

How temporary is temporary?
- We have already purchased 40 new compute nodes which will arrive mid- to end-September. Once installed, the 40 nodes will become part of the petra4 partition.
- At this point, the maxwell partition will return to a normal schedule.
- In addition, the 40 nodes will also be made available for short running short (2-4 hours) for users of the maxwell partition.
- After the petra4 compute campaign, the nodes will be fully integrated into the maxwell partition and more than double the core-count.

So we expect that beginning of October (of course depending on the timely delivery by our vendor) the maxwell partition will be fully available again, and augmented by additional resources.
Be ensured that we do treat this matter with highest urgency trying to minimize the temporary regression.

We understand that the temporary adjustment will affect some users in rather a harsh way, but hope for your understanding.

Please contact us ( for any questions or comments, and in case you have really urgent computational requests. Despite limited options we will do our best to mitigate effects.

between  15:34 and 16:47, some Maxwell storage was disturbed, notably Maxwell home directories.
The problem is resolved.

we recently updated the following software

julia 1.5 (
singularity 3.6 (

Julia on Maxwell

we've added the julia programming language to the maxwell software repository

Julia is a high-level, high-performance, dynamic programming language. While it is a general purpose language and can be used to write any application, many of its features are well-suited for numerical analysis and computational science. (Wikipedia)

To use it "module load maxwell julia" or append "/software/julia/default/bin/" to your path.

Further documentation:

Julia in the Jupyter Notebook:

In order to use julia as a Kernel in the Jupyter Notebook
one has to first load julia (see above) and then start the julia interpreter
by entering the command "julia" in the command line.

Then one has to enter the following commands to install iJulia and make 
the Kernel available for jupyter:

julia> using Pkg
julia> Pkg.add("IJulia")

Then one just has to refresh the main jupyter notebook page and
Julia should become available as a choice for a Kernel in the Notebook
along side various Anaconda Python variants that are installed on Maxwell.

OpenMPI problems

With the new OS version comes a small problem with the standard openmpi
implementation. With "module load mpi/openmpi-x86_64" you use the standard openmpi
from centos. If your job is crashing with "SEGVAULT" you have to add a
parameter for your mpirun.  If possible use "ucx" as this is the new standard protokoll in MPI.

There are two ways to achieve  this:

  1. You can  create a file in your homedirectory with one line "pml=ob1"
    or "pml=ucx"

    # mkdir ~/.openmpi
    # vi ~/.openmpi/mca-params.conf
    # cat ~/.openmpi/mca-params.conf
  2. You can add the parameter to your mpirun commando for example "mpirun --mca pml ob1 foo"
    or "mpirun --mca pml ucx foo"
    (foo := name of your programm)

There is a local root exploit for GPFS commands like mmlsquota, see

As a temporary workaround the setuid-flag has been removed from GPFS commands, disabling regular users from running e.g. mmlsquota. Running mmlsquota now throws a rather misleading error:

mmlsquota -u $USER --block-size auto max-home
Failed to connect to file system daemon: No such process
mmlsquota: GPFS is down on this node.
mmlsquota: Command failed. Examine previous error messages to determine cause.

That just means, that $USER is not privileged to run the command, not that there is something wrong with GPFS. It will be fixed with the next GPFS update on maxwell.

Anaconda3 updates

A number of conda packages have been updated or added:

cudatoolkit               10.0.130                      0    upgraded from cuda 9.0 to 10.0
cudnn                     7.6.5                cuda10.0_0    new
cupy                      7.4.0            py36h273e724_1    new
dask                      2.15.0                     py_0    upgraded
dask-core                 2.15.0                     py_0    upgraded
dask-glm                  0.2.0                      py_1    upgraded
dask-jobqueue             0.7.1                      py_0    upgraded
dask-labextension         1.0.3                      py_0    upgraded
dask-ml                   1.4.0                      py_0    new
dask-mpi                  1.0.3                    py36_0    new
distributed               2.15.2           py36h9f0ad1d_0    upgraded
extra-data                1.1.0                    pypi_0    upgraded
extra-geom                0.9.0                    pypi_0    upgraded
ipyslurm                  1.5.0                      py_0    new
libblas                   3.8.0                    14_mkl    upgraded
libcblas                  3.8.0                    14_mkl    upgraded
liblapack                 3.8.0                    14_mkl    upgraded
nccl                          hd6f8bf8_0    new
pyfai                     0.19.0           py36hb3f55d8_0    new. used to live in a conda env
pytorch                   1.4.0             cuda100py36_0    upgraded from 1.0 cuda 9.0
torchvision               0.2.1                    py36_0    new
xarray                    0.11.2                   pypi_0    upgraded

Due to conflicting dependencies, SuRVoS has been moved to a conda environment (survos):

@max-wgs:~$ conda env list | grep survos
survos                   /software/anaconda3/5.2/envs/survos

@max-wgs:~$ module load maxwell survos

@max-wgs:~$ which SuRVoS

The problem was resolved at ~10:30

we are experience in the moment a serious problem in our GPFS infrastructure.

We are in the process to analyze which parts are involved and can't therefore can't give now any further details.

At the moment the maxwell home directories are not available so you can't login to the maxwell login nodes. The software folder is also not accessible.

The beegfs is not involved so it is clear that it is not the infiniband network.

As soon as we get new information we will update this post

docker had to be removed from maxwell login nodes due to severe security concerns. Running docker on batch nodes is not affected. To build docker images, please use your personal machines, or one of the batch nodes.

Software Updates

During the last weeks we've updated all workgroup server and all compute nodes to Centos7.7. For details you may look at the release nodes.

Additionally we've updated singularity to the latest version 3.5.

From 98.10.2019 9:00 till 14:00 o'clock we had severe problems with the infiniband network on the Maxwell cluster. The home directories and several other GPFS offline storages were not available.
So login to Maxwell was not possible and running jobs could be disturbed.

The Maxwell-Cluster is composed of a core partition (maxwell) and group specific partitions. All compute nodes are however available for everyone!

The Maxwell-Cluster is primarily intended for parallel computation making best use of the multi-core architectures, the infiniband low-latency network, fast storage and available memory. The cluster is hence not suited for single-core computations or embarrassingly parallel jobs like Monte-Carlo productions. Use BIRD, Grid or your groups workgroup server (WGS) for this kind of tasks.  

The entire cluster is managed by SLURM scheduler (with some notable exceptions). The SLURM scheduler essentially works on a "who comes first" basis. The group specific partitions however have slightly different rules: though everyone can run jobs on group specific nodes, members of the group will have a higher priority and will compete non-group jobs off the partition. See Groups and Partitions on Maxwell for details.

  • To get started, please have a look at the Getting Started page!
  • The Maxwell Hardware page provides a list of currently available nodes & configurations.
  • The Maxwell Partitions page provides a quick overview of the nodes, capacities, features and limits of the individual partitions.

  • Read the documentation! It should cover at least the essentials. If you come across incorrect or outdated information: please let us know!



For any questions, problems, suggestions please contact:

All Announcements will be sent via Users with the maxwell-resource are automatically subscribed.

We strongly recommend that all maxwell-users without maxwell-resource self-subscribe even if you are using exclusively group-specific resources.


  • No labels