Computing : A short WalkThrough

Short example: A walk-through

* Log into a workgroup server (see above)


* create a job description to tell Condor what to do - for example, see
the attached submit-file
myjob.submit
where we tell Condor, that we want to run a script called mypayload.sh
and to read/write files to your current directory [details]

* to submit the job, run
condor_submit myjob.submit
which will give you in return the ID of the job - let's say '2660'

* while the job is queued or running, check for its status with
condor_q 2660

* while the job is running, a file in your submission directory should
be updated every now and then by Condor telling you about its status.

In the submit-file we told Condor that its name should be
mypayload.log

Attach to it with
tail -F mypayload.log
for updated infos (not much since the toy job mainly sleeps)

* after the job has finished, condor will drop the job's output files
(log, stdout and stderr) into your submission directory as well

That are the basic steps for a condor jobs - since Condor is highly
flexible complex workflows can be realized, e.g., dynamic arrays of jobs
with conditions entangled etc. pp.

And if you got grabbed by Condor and want to exploit some readily
available extra resources consider to become a pilot user. Please let us
know so that we can add you to a dedicated mailing list for extra support.

[details]
see the attached fiel mypayload.sh -- remember to make it executable
with 'chmod u+x mypayload.sh'

additionally we tell Condor in the submission file the file names for
errors and the normal terminal prints as well as it's log (and how/when
to handle the files)

since we are old fashioned, we want our node to run with Linux 'SL6"

with the 'queue' we tell Condor to really put the job into the queue
(annoying if one forgets...)

chbeyer@dkw:~$ cat myjob.submit


### let's run the program from the shared file system "DUST"
### advantage is, that the program is readily available on all batch nodes
### but do not touch the program while your jobs are still running or waiting - as to have a consistent state for your whole set of jobs
executable          = /nfs/dust/my/path/to/mypayload.sh

###  you can also upload the program into each job and skip the shared file system
### advantage is, that the program is consistent as it is staged at the beginning
### disadvantage can be, that for large binaries (meaning: not a small script) copying can slow down everything severely
###                         especially when staging from DUST via Condor into the job's home directory again on DUST
#  transfer_executable = True   ### un-comment to stage the program into each job

universe            = vanilla

input               = /nfs/dust/my/path/to/data/mypayload.data

output              = /nfs/dust/my/path/to/a/dir/mypayload._$(Cluster)_$(Process)out

error               = /nfs/dust/my/path/to/some/other/dir/mypayload_$(Cluster)_$(Process).error

log                 = /nfs/dust/my/path/to/some/more/dir/mypayload_$(Cluster)_$(Process).log

#_$(Cluster)_$(Process) gets substituted by cluster and process ID, putting it in the output files leads to individual files

#for each job. Remember that regular filesystem rules about maximum files in a directory and maximum filesizes apply (warning)

#htcondor will (as any other batchsystem) not create any directories for you, hence these need to exist.

##########################

#apart from 'queue' at the bottom these are optional feature requests that you might consider but do not need to set

#for a simple test job.

# job requirements       #

# special requirements as nly nodes with specific linux flavours

# e.g., requesting a node, that runs either with ScientificLinuc 6 or with CentOS 7


#requirements            = (OpSysAndVer == "SL6" || OpSysAndVer == "CentOS7")

#

# maximum memory in MB; a job gets killed by the system when exceeding the request and the node has no spare memory

# default is 1536M and jobs requesting > 2048 get more hit in the fairshare calculation

#

#RequestMemory = 1024

#

# max run time in seconds for a job, after it gets killed by the system

# if not set, default is 3 hours

# longer requested job run times get more hit in the fairshare calculation

#

#+RequestRuntime     = 7200

#

#

##########################

queue




mypayload.sh

mypayload.data