On the 1st of July 2018, the SGE/BIRD job submission system at DESY was decommissioned. For submitting jobs to the batch system, one has to use HTCondor/BIRD. Even though it is possible to write your submission scripts in SGE context, it is recommended to use the HTCondor context for the best management. 

To submit jobs to HTCondor/BIRD, you need two submission scripts, the first one to set up the HTCondor work environment and the second one to execute your jobs. 

Setting up HTCondor work environment:

As an example:

######################################################
# HTCondor Submit Description File. COMMON TEMPLATE
# Next commands should be added to all your submit files
######################################################

#
# Infos from your job:
Output = $(Cluster).$(Process).out
#
# Contains only last error:
Error = $(Cluster).$(Process).err
#
# Infos from the scheduler:
Log = $(Cluster).$(Process).log
#
# Default Universe for normal jobs
Universe = vanilla
#
# Until now we use shared file system:
Should_Transfer_Files = NO
#
# Defaults to submit dir:
InitialDir = $ENV(PWD)
#
# Normally set by submit host:
#+MyProject = "MyProject"
#
# Test it:
#Arguments = "sleep 600"
#
# Mailing requests:
#notification = $<$Always | Complete | Error | Never$>$
notification = Always
#
# Your mail address:
#notify_user = huong.lan.tran@desy.de
#
# Defaults to 1 day:
#+RequestRuntime = 3600 * 12
d#
# Defaults to 3G
RequestDisk = 2048 * 3
#
# Defaults to 1500M:
RequestMemory = 1024 * 2
#
# Default is 1 job per submit
# Job Id is $(Cluster).$(Process) e.g. 20202.0
# request_cpus=10
#
# Operating system request:
#Requirements = OpSysAndVer=="SL6"

######################################
## Execute jobs
## Here is an example
## to run simulation of 10 GeV e- in 100 tasks
######################################

N = 100
Executable = $ENV(PWD)/run.sh
Arguments = "e- 10 $(Process)"
Queue $(N)

htc_jobSettings.sub

It is important to set up the request runtime, disk, memory and number of CPUs in the most efficient way. You can run one job test and get out these information to adjust your parameters. On BIRD machines, the default operating system is now SL6, but you can also request that to make sure. 

It is also strongly recommended that you submit your jobs in array, as shown in the example. Array jobs give you more advantages:

  • You only have to write one shell script.
  • You don’t have to worry about deleting thousands of shell scripts, etc.
  • If you submit an array job, and realize you’ve made a mistake, you only have one job id to remove, instead of figuring out how to remove 100s of them.
  • You put less of a burden on the head node.
  • Much easier for book-keeping. 

Array job submission in HTCondor is done through the queueing command:

N = 100
Executable = $ENV(PWD)/run.sh
Arguments = "e- 10 $(Process)"
Queue $(N)

Your executable script (next paragraph) will be queued and executed 100 times. The processes are numbered from 0-99. You can give this process id $(Process) as an argument to the executable file and use them later, for example, to give name to the slcio files resulting from your job. 

As arguments, the particle and the energy are also given.

Executable script:

run.sh

Here it is important to set the DEST_PATH correctly. In the attached example, the executable script is designed to run the simulation with Geant4. But you can also write a very simple script for reconstruction as followed:

#!/bin/bash

source /cvmfs/ilc.desy.de/sw/x86_64_gcc48_sl6/v01-17-10/init_ilcsoft.sh

./myMarlin steering.xml 

Submit jobs in interactive mode

HTCondor gives you the opportunity to run the jobs in interactive mode. Before sending 100s or 1000s jobs to the BIRD, it is better to test one job in interactive mode to see if there is any mistake to be corrected in the submitting scripts. 

condor_submit -i htc_jobSettings.sub


  • No labels