Computing : Command Quick Reference

Interactive Job

To run an interactive job, i.e., get an interactive shell on a batch node, run the submission with the '-i' flag and wait for Condor to assign you a node

condor_submit -i myjob.submit

Running job infos

a job's ID consists of its actual ID (on its scheduler/pool, where it got started/run) separated by a dot from the number of its incarnations,

e.g. 98765.2

which is job 98765 and it has been run now for three times (0,1,2) - i.e., a job can be set so that it gets restarted after a error (however, don't expect it to be possible everywhere as admins might be choose to allow only singular runs of a job as to avoid broken jobs running several times and wasting resources in every incarnation...)

Why is my job not starting?

For detailed output on a job run

condor_q -global -better-analyze JOBID.#

which will give you the stats on the job and which requirements have been matched or cannot be matched at the moment.

Searching for jobs

To search, cut and select for specific jobs, run something like

condor_q -global -constraint 'jobstatus!=4 && RequestCpus==8 && regexp("atlas.*", OWNER)' -autoformat:th owner jobstatus QDate RequestCpus RequestMemory ClusterID ProcID

where

  • -constraint 'conditions'  takes the requirements - the syntax is somewhat like in most programming languages including boolean operators
    • here we ask for all jobs that are not completed (4), have requested 8 cores and where the value of 'owner' matches anything starting with "atlas"
      (see here for more details on Condor ClassAd attributes and matching)
    • see here for the default list of ClassAds attributes a job can get in Condor depending where/how/when it got run (probably 95% of these you will never need or encounter)
  • -autoformat is a quick switch to print specific information/ClassAds, without the flag Condor will print the default attributes (note that you can also format the output pretty flexible yourself, but then the python bindings might be the betetr way to go)
    • here we print for each job it's ClassAd attribute: owner, job status, date in epoch it got queued, etc.
    • ClusterID and ProcID identify each job - if you submit a single job, the JobID is ClusterID.ProcID
  • add flag '-allusers' to get all jobs and not just yours

for a summary of other Condor job etc. states see: Condor Job States

Finished job infos

For finished jobs you can check their status with

condor_history [-long]  JOBID.#

which will give you the Class Ads (the job information) of the job when it finished up.

Interesting might be the ClassAd

RemoveReason = "The system macro SYSTEM_PERIODIC_REMOVE expression '((JobStatus == 5 && (CurrentTime - EnteredCurrentStatus) > 14 * 24 * 3600)) || (JobRunCount > 10) || ((JobStatus == 2) && ((CurrentTime - EnteredCurrentStatus) > MaxJobRetirementTime))' evaluated to TRUE"

which contains for automatically removed jobs the reason. Here, the job got killed because it either had set on hold (JobStatus == 5) for more than 2 weeks (CurrentTime - EnteredCurrentStatus) > 14 * 24 * 3600), or it had been run for more than 10 incarnations (JobRunCount > 10) or if it was still running (JobStatus == 2) and its runtime exceeded the allowed runtime ((CurrentTime - EnteredCurrentStatus) > MaxJobRetirementTime)),

Deleting jobs

deleting a single job

condor_rm  JOBID

use the 'conditions' option to select and delete multiple jobs by more complex conditions

print the list of jobs matching your condition, here all running jobs
here as example: status==2 and that were queued before time 1504864800 in unix epoch

condor_q -global -const 'jobstatus==2 && QDate < 1504864800'

if the list of jobs is what you expect, run the same constraint with condor_rm

condor_rm -const 'jobstatus==2 && QDate < 1504864800'

Submit with old GE syntax

Condor provides a wrapper 'condor_qsub' to submit jobs in a GridEngine syntax way

http://research.cs.wisc.edu/htcondor/manual/current/condor_qsub.html

condor_qsub does provide a subset of qsub's options, so your mileage may vary if you try 'exotic' things.
For Condor's full functionality and feature set better migrate to the native Condor tools ASAP.