Computing : Howto use directed acyclic Graphs (DAG) in BIRD

A directed acyclic graph (DAG) can be used to represent a set of computations where the input, output, or execution of

one or more computations is dependent on one or more other computations. The computations are nodes (vertices) in

the graph, and the edges (arcs) identify the dependencies. HTCondor finds machines for the execution of programs,

but it does not schedule programs based on dependencies. The Directed Acyclic Graph Manager (DAGMan) is a

meta-scheduler for the execution of programs (computations). DAGMan submits the programs to HTCondor in an order

represented by a DAG and processes the results. A DAG input file describes the DAG.

DAGMan is itself executed as a scheduler universe job within HTCondor. It submits the HTCondor jobs within

nodes in such a way as to enforce the DAG’s dependencies. DAGMan also handles recovery and reporting on the

HTCondor jobs. (from the manual http://research.cs.wisc.edu/htcondor/manual/v8.7/2_10DAGMan_Applications.html)


The submit file for the DAG describes the dependencies of the jobs to execute, e.g.:

[chbeyer@htc-cms01]~/htcondor/testjobs% cat sleep_dag.submit                                               

# File name: sleep.dag

JOB A sleep_job.condor

JOB B sleep_job.condor

JOB C sleep_job.condor

JOB D sleep_job.condor

PARENT A CHILD B C

PARENT B C CHILD D


The actual 'worker-jobs' are regular condor-jobs and all common rules apply.

As these jobs get submitted from the scheduler universe on the scheduler itself they can not inherit your project, hence you must put an appropriate project in the submit file !

[chbeyer@htc-it02]~/htcondor/testjobs% cat sleep_job.condor
# Unix submit description file
# sleep.sub -- simple sleep job

executable              = /afs/desy.de/user/c/chbeyer/htcondor_exec/sleep.sh
log                     = /afs/desy.de/user/c/chbeyer/log_$(Cluster)_$(Process).log
output                  = /afs/desy.de/user/c/chbeyer/out_$(Cluster)_$(Process).txt
error                   = /afs/desy.de/user/c/chbeyer/error_$(Cluster)_$(Process).txt
+MyProject = "support"
queue



At submit time of the DAG you can add 'MyProject' if you need todo so. Obmitting 'MyProject' will result in a job with the default accounting group of the host that you use as a submit node.

[chbeyer@htc-cms01]~/htcondor/testjobs% condor_submit_dag -append '+MyProject = "support"' sleep_dag.submit


Running rescue DAG 7

-----------------------------------------------------------------------

File for submitting this DAG to HTCondor           : sleep_dag.submit.condor.sub

Log of DAGMan debugging messages                 : sleep_dag.submit.dagman.out

Log of HTCondor library output                     : sleep_dag.submit.lib.out

Log of HTCondor library error messages             : sleep_dag.submit.lib.err

Log of the life of condor_dagman itself          : sleep_dag.submit.dagman.log


Submitting job(s).

1 job(s) submitted to cluster 771718.

-----------------------------------------------------------------------


[chbeyer@htc-cms01]~/htcondor/testjobs% condor_q -nobatch

-- Schedd: bird-htc-sched01.desy.de : <131.169.56.32:9618?... @ 02/07/18 13:48:01
 ID        OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD
771718.0   chbeyer         2/7  13:47   0+00:00:28 R  0    0.3 condor_dagman -p 0 -f -l . -Lockfile sleep_dag.submit.lock -AutoRescue 1 -DoR
771719.0   chbeyer         2/7  13:47   0+00:00:00 I  0    0.0 sleep.sh


HINT:

Do not submit the same DAG, with same DAG input file, from within the same directory, such that more than one of this same DAG is running at the same time. It will fail in an unpredictable manner, as each instance of this same DAG will attempt to use the same file to enforce dependencies. You can though change the directory and use the same file or rename the file per submission without altering the file content to overcome this !