Page tree

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 35 Next »

Software & Applications

The Maxwell-Cluster is a resources dedicated to parallel and multi-threaded application, which can use at least some of the specific characteristics. In addition to serving as a medium scale High-Performance-Cluster, Maxwell incorporates resources for Photon Science data analysis, resources of CFEL, CSSB, Petra4, the European XFEL...

If you find the resource useful for your work, we would greatly appreciate to learn about publications, which have been substantially benefiting from the Maxwell-Cluster. Drop us a mail at maxwell.service@desy.de. Acknowledgement of the maxwell-resource would also be greatly appreciated. It'll help to foster the cluster, for example: "This research was supported in part through the Maxwell computational resources operated at Deutsches Elektronen-Synchrotron (DESY), Hamburg, Germany"

Search the compute space



docker had to be removed from maxwell login nodes due to severe security concerns. Running docker on batch nodes is not affected. To build docker images, please use your personal machines, or one of the batch nodes.

Software Updates

During the last weeks we've updated all workgroup server and all compute nodes to Centos7.7. For details you may look at the release nodes.

Additionally we've updated singularity to the latest version 3.5.

From 98.10.2019 9:00 till 14:00 o'clock we had severe problems with the infiniband network on the Maxwell cluster. The home directories and several other GPFS offline storages were not available.
So login to Maxwell was not possible and running jobs could be disturbed.


Jupyterhub interruption

for another bugfix, minor configuration changes and addition of a few extensions we need to restart the jupyterhub (https://max-jhub.desy.de/) today at 19:00.
That will take only a few seconds, but it will most likely disconnect running kernels. In that case, you'd need to use the control panel to "Start My Server" and relaunch your notebook. The session (i.e. the slurm job) will persist.
Apologies for the inconvenience.

on 19th Sep in the time from ~5:00 to 9:00 the home filesystem was on several nodes in the maxwell cluster not available.
The problem is solved. For further questions send an email to maxwell.service@desy.de

Python3 update

with the update last week (2.9.2019) we removed all remaining python34 packages, because
python3.4 has reached end of life (https://www.python.org/downloads/release/python-3410/)

And so the third party repository also don't offer it anymore.

DESY also provided some packages for 3.4 and not all of them are
rebuilded for 3.6 now. So if you miss a package give us an email
at maxwell.service@desy.de

with the update last week to Slurm 19.05 the syntax in sbatch command
files changed. The parameter "--workdir" is renamed in "--chdir"
like in all other commands.

For details:
https://slurm.schedmd.com/sbatch.html

Updated GIT

We provide an updated GIT client in the software section

% git --version
git version 1.8.3.1
% module load maxwell
% module load git    
% git --version  
git version 2.23.0

The problems regarding the all and allgpu partition are solved. If you still have issue regarding the schedule of your batch jobs please send us a mail to maxwell.service@desy.de


Original Message:
After the slurm update last week we see some problems regarding the "all" and "allgpu" partitions. Jobs from "privileged" partitions (exfl,cssb,upex ...) preempting (killing) jobs which were submitted to the all* partitions. Even if the privileged jobs can't use the preempted nodes afterwards due to constaints in the job definition. (see https://confluence.desy.de/display/IS/Running+Jobs+on+Maxwell) The privileged job will "kill" a job in the all* partition every 3 minutes until a matching node is found and the "privileged" job starts. As this bug is only triggered by pending jobs in the privileged partitions with extra constraints , not all jobs in the all* queues will fail. So for example the last 10h no job was preempted in the all* queue We filed a bug report to SchedMD (the company we have a SLURM support contract with) and looking forward for a solution.


The Maxwell-Cluster is composed of a core partition (maxwell) and group specific partitions. All compute nodes are however available for everyone!

The Maxwell-Cluster is primarily intended for parallel computation making best use of the multi-core architectures, the infiniband low-latency network, fast storage and available memory. The cluster is hence not suited for single-core computations or embarrassingly parallel jobs like Monte-Carlo productions. Use BIRD, Grid or your groups workgroup server (WGS) for this kind of tasks.  

The entire cluster is managed by SLURM scheduler (with some notable exceptions). The SLURM scheduler essentially works on a "who comes first" basis. The group specific partitions however have slightly different rules: though everyone can run jobs on group specific nodes, members of the group will have a higher priority and will compete non-group jobs off the partition. See Groups and Partitions on Maxwell for details.

  • To get started, please have a look at the Getting Started page!
  • The Maxwell Hardware page provides a list of currently available nodes & configurations.
  • The Maxwell Partitions page provides a quick overview of the nodes, capacities, features and limits of the individual partitions.

  • Read the documentation! It should cover at least the essentials. If you come across incorrect or outdated information: please let us know!

maxwell-layout

Contact

For any questions, problems, suggestions please contact: maxwell.service@desy.de

All Announcements will be sent via maxwell-user@desy.de. Users with the maxwell-resource are automatically subscribed.

We strongly recommend that all maxwell-users without maxwell-resource self-subscribe even if you are using exclusively group-specific resources.

Subscribe: https://lists.desy.de/sympa/info/maxwell-user




  • No labels