Grid : Condor Event Logs to Kafka

While the Condor logfiles are not good machine readable, the event log can be configured as a better parsable format.

Since Condor 9.0.x the EventLog output can be JSON (while before only Condor-native or XML were possible - both somewhat unpleasant to parse).

CondorCE 5.y has not yet JSON as output format.

For the moment, we parse only event logs on the schedulers. If the system is stabile, other nodes/daemons can be added.

Workflow

  • Condor writes its event log JSON-formated
  • Logstash parses the event log file
    • and sends the individual events on Kafka
  • a Kafka consumer inserts the events to Elastic Search
    • optional a trigger on certain events

Condor Config

/etc/condor/config.d/90_10-logging.conf
EVENT_LOG_MAX_SIZE =  500000000
EVENT_LOG_MAX_ROTATIONS = 4
EVENT_LOG_JOB_AD_INFORMATION_ATTRS= x509UserProxyEmail, x509UserProxyFQAN, x509UserProxyVOName, User, Owner, ClusterId, ProcId, RequestCpus, GlobalJobId, queue, RoutedToJobId, SubmitterGlobalJobId, SubmitterId

EVENT_LOG_USE_XML=False
EVENT_LOG_FORMAT_OPTIONS=JSON, ISO_DATE
EVENT_LOG_COUNT_EVENTS=true
EVENT_LOG = /var/log/condor/EventLog.json


Logstash Config

/etc/logstash/conf.d/condor_eventjson_batch-eventlogs.conf
input {
  file {
    path => "/var/log/condor/EventLog.json"
    start_position => "beginning"
    sincedb_path => "/var/spool/logstash//var/log/condor/EventLog.json.sincedb"
    exclude => "*.gz"
    type => "json"
      codec => multiline {
        pattern => "^{$" 
        negate => "true"
        what => "previous"
      }
    tags => ["/var/log/condor/EventLog.json","batch-eventlogs","grid","grid-lrms","condor-scheduler","condor-master" ]
  }
}

filter {
# and no, "\n" does not work 
mutate { gsub => [ "message", "
", "" ] }
mutate { gsub => [ "message", "\n", "" ] }
# does not work as in json string values need double quotes!!
# mutate { gsub => [ "message", "\"", "'" ] }
}


output {
  # stdout{
  #   codec => "json"
  # }
  # file {
  #   path => "/tmp/logstash.eventjson.debug"
  # #    codec => "json_lines"
  #   codec => "json_lines"
  # }
  kafka {
    codec => json
    topic_id => "batch-eventlogs"
    bootstrap_servers => "it-kafka-broker01.desy.de:9092,it-kafka-broker02.desy.de:9092,it-kafka-broker03.desy.de:9092"
    client_id => "DESY_Grid_Condor_schedd_grid-htcondorce-dev"
  }
}


Testing

Scheduler
# run as logstash user to avoid owning files to another user
root@grid-htcondorce-dev: [~] sudo -u logstash /usr/share/logstash/bin/logstash --path.settings  /etc/logstash --log.level debug
consumer
<hartmath@naf-it01:~> /usr/local/bin/kafkacat -C -b it-kafka-broker01.desy.de:9092 -t  batch-eventlogs

Documentation