Grid : APEL Accounting

The accounting within EGI/WLCG of jobs in the local HTCondor LRMS is handled through APEL, where the CondorCE package ships some scripts, that prepare the general export/convert of Condor statistics into Athe APEL blah and batch records. After a bit of trying and adapting other documentations, the setup of the accounting towards APEL looks at DESY-HH like the following

CondorCE

The setup has been tested on version HTCondorCE @ 4.4 and HTCondor @ 8.9

Prerequisites

Database

You need a SQL database, to which each CondorCE will send its accounting data. From the database the SSM will then send the accounting data to the EGI endpoint.
We are using a central Galera SQL cluster on which a database 'condor' has been added for that purpose.

Please set up a (My)SQL database and ensure, that you have a (my)sql client installed on the CondorCE, so that you can connect to it with something like

> mysql -u condor -p -h MYSQLDB.desy.de -D DB_USER_NAME

To create a new database for APEL on your SQL, run

mysql -u condor -p -h MYSQLDB.desy.de -D DB_USER_NAME < client.sql

where we took the schema file `client.sql` from `/usr/share/apel/client.sql`, which comes with the `apel-client` package (probably also the schemas on https://github.com/apel/apel/tree/dev/schemas might work, but the source for the correct schema is not 100% clear)

GOC DB

For the node, from were you will publish the records (in our case the CondorCE itself), you will need to create an entry in the GOCDB as APEL service. For our CE, it looks like

https://goc.egi.eu/portal/index.php?Page_Type=Service&id=12417

please do not forget to add the `Host DN` string, as this is used to identify and authorize the node at APEL.

Scripts

All APEL-related scripts are called from the script

> cat /usr/share/condor-ce/condor_ce_apel.sh

/usr/share/condor-ce/condor_blah.sh # Make the blah file (CE/Security data)
/usr/share/condor-ce/condor_batch.sh # Make the batch file (batch system job run times)
/usr/bin/apelparser # Read the blah and batch files in
/usr/bin/apelclient # Join blah and batch records to make job records
/usr/bin/ssmsend # Send job records into APEL system

where the APEL-specific blah and batch record files are created by bash scripts querying the `condor_history` command. Thus, ensure, that your Condor history file has a lifetime longer than the APEL run, i.e., if you update APEL once a day, the Condor history should cover at least a day (and not getting rotated before).

Configurations

The APEL configuration files can be found under

/etc/apel/client.cfg
/etc/apel/sender.cfg
/etc/apel/receiver.cfg
/etc/apel/parser.cfg

Due to the BDII being phased out, we have been trying to use the SSM endpoint directly, i.e., `host = mq.cro-ngi.hr`. Please ensure with the APEL admins, that the endpoint is the correct one for you.

In the `[messaging]` config block change the test queue from `destination: /queue/ssm2test` to a production queue, after all your tests worked and you got a green light from APEL.

The log etc. paths we use in the configs might need to be to be created.

We send the APEL logs to `/var/log/apel` and write the blah/batch accounting records to `/var/lib/condor-ce/apel` (might need a prune every now and then)

Service/Timer Units

By default, the APEL script is run every 24 hours through a timer unit `condor-ce-apel.timer` that initiates the actual service unit `condor-ce-apel.service`,  that executes `/usr/share/condor-ce/condor_ce_apel.sh` as root.

See

> systemctl status condor-ce-apel.timer
> systemctl status condor-ce-apel.service

for details, which unit files are actually loaded and adapt these if necessary (do not forget to reload the daemon!).

The service only accounts the job statistics of the previous day - if due to some reason the service has not been run for more than a day, you will need to publish the missing records yourself.

Result

if everything goes well, your published job statistics should be summarized at the EGI accounting portal - for example, DESY-ZN and DESY-HH statistics for the final months of 2020

https://accounting.egi.eu/egi/site/DESY-HH/elap_processors/SubmitHost/DATE/2020/8/2021/1/egi/onlyinfrajobs/
https://accounting.egi.eu/egi/site/DESY-ZN/elap_processors/SubmitHost/DATE/2020/8/2021/1/egi/onlyinfrajobs/

Caveats

  • as of writing, the APEL scripts are to some degree based on Python 2 - due to Python 2's end of life, the accounting might have problems on Python 3-only OS releases (the above setup has been tested only on CentOS 7)
  • HTCondor will change its commands from the myriad of specific tools condor_q, condor_rm, ... to a more consolidated approach `condor subcmd options` with the next major release branch ~10. So all classic bash scripts and so on not utilizing the python bindings etc. might need to be adapted
  • enabling use_ssl might collide with a bug in SSM https://github.com/apel/ssm/issues/111


Beware, that some of the existing documentation is several years old and might not be working 1:1

https://wiki.chipp.ch/twiki/bin/view/LCGTier2/ServiceApel


Puppet Files

Our puppet manifests/classes to deploy/configure our CondorCEs (please note, that it relies heavily on parameters kept in Hiera etc. and on other DESY-sepcific classes)

Attachments:

parser.cfg (application/octet-stream)
receiver.cfg (application/octet-stream)
sender.cfg (application/octet-stream)
client.cfg (application/octet-stream)
client.cfg (application/octet-stream)
client.cfg (application/octet-stream)