Compute resources & background information

The Batch Infrastruktur Resource at DESY (BIRD) is a multi-purpose batch-cluster.

This is the successor of BIRD (SGE) which was closed down in June 2018.

We finished the migration to the new batch and scheduling software in April 2018.

The BIRD (HTCondor) is in production. We use HTC as abbreviation of HTCondor.





Quick start & hint for best throughput

  • Use an interactive batchsession for a quick lookaround and testing, see: Interactive batchsession
  • Submit 'standard jobs' that will claim 1core, 2gb of memory and 3h of runtime to benefit from oversubscription to everybody elses quota. Only these standard jobs allow you to use all slots available in the pool, ignoring any quotation limits of your group !

Questions and Problems

  • if you encounter a problem while running on the NAF, you can always contact the IT staff for help
  • how to best write a ticket to help the IT staff helping you, please check Creating a IT ticket best practices
  • please check Getting support and FAQ for which support mail addresses are best suited for a problem

If you are 'new' to HTCondor we strongly recommend this talk/tutorial by Todd Tannenbaum, which will give you a general overview on HTCondor and a perfect introduction on how to use it, 20 minutes well spent !


Short overview

The HTC system features

  • ~8000 CPU cores (more to be added)
  • fair share load distribution and quota handling
  • integration of DESY wide batch resources
  • sophisticated resource handling in single and multi core environments
  • afs and kerberos support for authentication and resource access
  • afs and dust mounts on all poolnodes
  • runs on HTCondor (HTC)


 Short example: A walk-through: See separate page on the left


Job Environment Variables:

Job Environment Variables
Since the end of April 2018 we care for setting up the shell environment:
- As done also on Gridengine we set up the standard user environment
- At a minimum PATH and USER are always set to a minimum/standard value
- If you use the ClassAd "getenv = True" switch (which is not recommended) you
  might also set the ClassAd "setENV = False", if you believe that the full submit host
  environment is usable and valid for your selected batch worker nodes


The HTCondor itself sets a very limited shell environment:


BATCH_SYSTEM=HTCondor
KRB5CCNAME=FILE:/var/lib/condor/execute/dir_<SomeID>/<User>.cc
OMP_NUM_THREADS=1
PWD=/afs/desy.de/user/.../<JobDir>
SHLVL=3
TEMP=/var/lib/condor/execute/dir_<SomeID>
TMP=/var/lib/condor/execute/dir_<SomeID>


Simple Overview on BIRD Resources:



Resources:

Quickstart(web): Guide

User Guide(web): Manual

FAQ(web): FAQ

Slides of Migration Talk: HEPiX-Talk


Online statistics: 

We are still working on a final version, in the meantime, use Day Statistics


Contact:

naf (dash) helpdesk (at) desy (dot) de : NAF request tracker

bird (dot) service (at) desy (dot) de : Operational issues

Attachments:

4test.submit.txt (text/plain)
myjob.submit (application/octet-stream)
mypayload.sh (application/x-sh)
mypayload.data (application/octet-stream)
myjob.submit (application/octet-stream)
myjob.submit (application/octet-stream)
image.png (image/png)
Bild1_SimpleBlock_Export.png (image/png)

Comments:

Der Link unter 'Manual'

  http://research.cs.wisc.edu/htcondor/manual/latest/ref.html

zeigt ins Leere.

Posted by gellrich at 28. Feb. 2020 10:35

Done

das wurde letztlich auf 'read-the-docs' umgestellt ...

Posted by chbeyer at 28. Feb. 2020 11:28