Grid : Minimal Submit Node


Minimal submit node

Have the usual GSI stuff (voms-proxy-init) installed e.g. via CVMFS. Have htcondor packages installed. You do not need necessarily the htcondor_ce_client packages installed but the CE client package brings tools like condor_ce_trace for debugging. Submission to a Condor CE works via the local Scheduler, so that most existing Condor submit nodes should already be able to submit to a remote Condor CE. However, remote submit hosts, i.e., nodes that do not have a schedd daemon running but submit/forward a job to a schedd host probably do not work, as the Grid proxy needs to be forwarded etc.

Minimal HTCondor configuration in /etc/condor/config.d/ce-submit.conf:

use ROLE: Submit
AUTH_SSL_CLIENT_CADIR = /cvmfs/grid.cern.ch/etc/grid-security/certificates
GSI_DAEMON_TRUSTED_CA_DIR = /cvmfs/grid.cern.ch/etc/grid-security/certificates
Else the Condor client will expect the Grid certificates under /etc/grid-security/certificates

Start the condor unit and enable it as service for the coming reboots:

[submitnode] /root # systemctl start condor.service
[submitnode] /root # systemctl enable condor.service

In case one needs condor-ce debugging tools, install htcondor-ce-client package. Minimal configuration in /etc/condor-ce/config.d/ce-client.conf:

GSI_DAEMON_TRUSTED_CA_DIR = /cvmfs/grid.cern.ch/etc/grid-security/certificates

Testing the submission to the CE

request a proxy (check if you are using a tool version that works, the htcondor CE client pulls explicitly the C++ flavour of the voms tools, so you might want to source en environment with the Java version referenced from the Grid (needs Java JVM installed locally).

[submitnode] ~ % voms-proxy-init -voms dteam
Enter GRID pass phrase:
Your identity: /C=DE/O=GermanGrid/OU=DESY/CN=Andreas Haupt
Creating temporary proxy ............................................................................................... Done
Contacting  voms2.hellasgrid.gr:15004 [/C=GR/O=HellasGrid/OU=hellasgrid.gr/CN=voms2.hellasgrid.gr] "dteam" Done
Creating proxy ................................... Done
Your proxy is valid until Wed Aug 12 03:17:07 2020

Testing the DN mapping and authorization

Make sure, that a mapping exists on the CE for your proxy DN. I.e., ping the CondorCE and test the authorization

[submitnode] ~ % condor_ce_ping -verbose -name grid-htcondorce0.desy.de -pool grid-htcondorce0.desy.de:9619 WRITERemote Version: $CondorVersion: 8.9.7 May 19 2020 BuildID: 504263 PackageID: 8.9.7-1 $
Local Version: $CondorVersion: 8.9.7 May 19 2020 BuildID: 504263 PackageID: 8.9.7-1 $
Session ID: grid-htcondorce0:1834216:1597156581:9167
Instruction: WRITE
Command: 60021
Encryption: none
Integrity: MD5
Authenticated using: GSI
All authentication methods: FS,TOKEN,SCITOKENS,GSI
Remote Mapping: SOMELOCALMAPPEDUSERHERE@users.htcondor.org
Authorized: TRUE

Information about authentication methods that were attempted but failed:
AUTHENTICATE:1004:Failed to authenticate using SCITOKENS
AUTHENTICATE:1004:Failed to authenticate using IDTOKENS
AUTHENTICATE:1004:Failed to authenticate using FS

Sending a trace job for debugging

send a trace job to the CE as predefined debugging job

[submitnode] ~ % condor_ce_trace grid-htcondorce0.desy.de
Testing HTCondor-CE authorization...
Verified READ access for collector daemon at <131.169.223.129:9619?addrs=131.169.223.129-9619+[2001-638-700-10df--1-81]-9619&alias=grid-htcondorce0.desy.de&noUDP&sock=collector>
Verified WRITE access for scheduler daemon at <131.169.223.129:9619?addrs=131.169.223.129-9619+[2001-638-700-10df--1-81]-9619&alias=grid-htcondorce0.desy.de&noUDP&sock=schedd_1834144_00e5>
Submitting job to schedd <131.169.223.129:9619?addrs=131.169.223.129-9619+[2001-638-700-10df--1-81]-9619&alias=grid-htcondorce0.desy.de&noUDP&sock=schedd_1834144_00e5>
- Successful submission; cluster ID 3263
Resulting job ad: 
    [
        ClusterId = 3263; 
[...]
        CommittedSuspensionTime = 0
    ]
Spooling cluster 3263 files to schedd <131.169.223.129:9619?addrs=131.169.223.129-9619+[2001-638-700-10df--1-81]-9619&alias=grid-htcondorce0.desy.de&noUDP&sock=schedd_1834144_00e5>
- Successful spooling
Job status: Held
Job transitioned from Held to Idle
Job transitioned from Idle to Completed
- Job was successful

If everything works, you should be able to submit a dedicated job to the CondorCE by sending it as grid-universe job to the local Schedd, that forwards it to the CE

Submitting a real job

The job description file and some executable payload to run

[submitnode] > cat HTCondorCE.submit 
universe = grid
use_x509userproxy = true
#+Owner = undefined
grid_resource = condor grid-htcondorce0.desy.de grid-htcondorce0.desy.de:9619
# Files
executable = mypayload.sh
output = stdout
error = stderr
log = logs
# File transfer behavior
ShouldTransferFiles = YES
WhenToTransferOutput = ON_EXIT
# Optional resource requests
#+xcount = 4            # Request 4 cores
#+maxMemory = 4000      # Request 4GB of RAM
#+maxWallTime = 120     # Request 2 hrs of wall clock time
#+remote_queue = "osg"  # Request the OSG queue
# Run job once
queue


[submitnode] > cat mypayload.sh 
#!/bin/sh
DATE=$(date +%s)
...

submit the job to the local scheduler, which should evaluate the 'grid_resource' ad and contact the CE.

[submitnode] > condor_submit -debug HTCondorCE.submit
Submitting job(s)08/11/20 16:43:17 Can't open directory "/etc/condor/passwords.d" as PRIV_UNKNOWN, errno: 13 (Permission denied)
08/11/20 16:43:17 Can't open directory "/home/hartmath/.condor/tokens.d" as PRIV_UNKNOWN, errno: 2 (No such file or directory)
.
1 job(s) submitted to cluster 4.

Might be, that you see some warnings about missing passowrd/token directories, that should be not critical, since you are not submitting locally but with the proxy to the remote CE.