KISTI-TMP-SEContact: comp-dc-operations @ belle2.org


Contents

  • l



Production Plans

  • MC10
  • MC11
  • SKIM9x2, 40 TB
  • GCR2c
  • prod6

Production Status

Official MC9 production started at ~21:00 JST on July 5, 2017.

Starting with BGx0 generic samples (0.2 ab-1)

Submitted second batch of BGx0 generic jobs (July 7, ~04:00 JST)

Third and fourth batches of BGx0 generic jobs (July 10)

Submitted a few BGx0 signal samples (July 12 ~04:30 JST)

Submitted the phase 2 generic samples with BGx0 (July 14 ~04:00 JST)

Submitted the rest of the BGx0 signal samples (July 16 ~00:00 JST)

New requests for BGx0 signal samples submitted (July 19 ~01:30 JST)

MC9 restarted with BGx1 phase 2 samples - 50 fb-1 generic and signal samples (July 30 ~10:00 JST)

Submitted first batch of phase 3 samples with background - mixed and charged BBbar - about 140k jobs (August 12 ~08:00 JST)

Added uubar: ~180k jobs (August 13 ~10:30 JST)

Added ddbar: ~53k jobs (August 29 ~22:00 JST)

Added ssbar: ~51k jobs (Sept. 2 ~11:00 JST)

Submitted phase 3 low-multiplicity samples: ~43.5k jobs (Sept. 2 ~13:00 JST) → includes generator level skim so number of jobs is inflated compared to run time

Added ccbar and taupair: ~317k jobs (Sept 3 ~09:30 JST)

Submitted new signal MC samples: ~57.6k jobs (Sept 11 ~23:00 JST)

Submitted new phase 2 signal MC samples: ~21.2k jobs (Sept 12 ~05:00 JST) → short jobs < 3 hrs each

Submitted new phase 3 signal MC samples: ~600k jobs (Sept 24 ~22:30)

Submitted new phase 3 signal MC samples (almost all submitted now) (Sept 28 ~02:00 JST)

Submitted phase 3 Y(5S) bsbs and non-bsbs samples: ~54.4k jobs (Oct 6 ~04:30 JST)

Submitted phase 3 Y(5S) uubar samples: ~242k jobs (Oct 9 ~09:30 JST)

Submitted phase 3 Y(5S) ddbar samples: ~60k jobs (Oct 16 ~10:30 JST)

Submitted phase 3 Y(5S) ssbar and ccbar samples: ~300k jobs (Oct 18 ~02:00 JST)

Submitted a few last signal samples and the phase 3 Y(5S) taupair samples: ~200k jobs (Oct 24 ~03:00 JST)
     → The taupair samples should run as shorter jobs (~5-6 hours at KEKCC)

Submitted Y(6S) continuum samples: ~30k jobs (Oct 30 ~23:00 JST)

Submitted Y(3S) generic samples: ~260k 8h jobs (Nov 3 ~04:30 JST)

Submitted Y(3S) continuum samples (uubar): ~170k 5h jobs (Nov 5 ~08:00 JST)

Submitted Y(3S) continuum samples (ddbar, ssbar, ccbar): ~70k 5h jobs + ~80k 8h jobs (Nov 13 ~23:00 JST)

Submitted Y(3S) taupair samples: ~70k 5h jobs (Nov 17 ~01:00 JST)

Submitted remaining Y(5S) generic: ~4.5k jobs (Nov 20 ~00:00 JST)

Submitted next batch of Y(4S) generic (mixed, charged): ~100k 8h jobs, ~150k 5h jobs (Nov 20 ~03:00 JST)

Submitted Y(4S) uubar continuum: ~250k 8h jobs (Nov 25 ~22:00 JST)

Submitted Y(4S) ddbar continuum: ~100k 5h jobs (Nov 28 ~21:00 JST)

Submitted Y(4S) ssbar continuum: ~100k 5h jobs (Dec 6 ~21:00 JST)

Submitted Y(4S) ccbar continuum: ~200k 8h jobs (Dec 9 ~05:00 JST)

Submitted Y(4S) taupair: ~180k 5h jobs (Jan 2 ~08:30 JST)

Submitted Y(4S) mixed sample (batch 3): ~100k 8h jobs (Jan 8 ~23:00 JST)

Submitted a few low multiplicity samples (Jan 18)

Submitted Y(4S) bbbar samples for data challenge (phase 3, BGx1): ~220k 9hr jobs (Jan 18 ~02:00 JST)

Submitted Y(4S) ddbar samples for data challenge (phase 3, BGx1): ~130k 5 hr jobs (Jan 22 ~22:00 JST)

Submitted phase 3 BG overlay production scripts (Jan 27 ~00:50 JST)

Submitted Y(4S) uubar samples for data challenge (phase 3, BGx1): ~290k 9 hr jobs (Feb 2 ~00:30 JST)

Submitted Y(4S) ssbar samples for data challenge (phase 3, BGx1): ~95k 6 hr jobs (Feb 3 ~21:45 JST)

Submitted Y(4S) ccbar samples for data challenge (phase 3, BGx1): ~320k 7.5 hr jobs (Feb 13 ~10:30 JST)

Submitted Y(4S) taupair samples for data challenge (phase 3, BGx1): ~160k 8 hr jobs (Feb 19 ~23:00 JST)

Submitted Y(4S) generic charged (not for data challenge): ~150k 5h jobs (Mar 1 ~23:00 JST)

Submitted Y(4S) uubar samples (phase 3, BGx1): ~250k 8h jobs (Mar 2 ~21:00 JST)

Submitted phase 3, BGx1 low multiplicity samples as requested by the bottomonium group (Mar 8 ~00:00 JST)

Submitted MC10 analysis validation samples, both BGx0 and BGx1 (Mar 9, 01:40 JST)

More MC9 skim jobs approved (Mar 12, ~23:00 JST)

Submitted Y(4S) ddbar samples (phase 3, BGx1): ~100k 5h jobs (Mar 13 ~00:00 JST)

Submitted Y(4S) ssbar samples (phase 3, BGx1): ~95k 5h jobs (Apr 3 ~04:00 JST)

Submitted Y(4S) ccbar samples (phase 3, BGx1): ~200k 8h jobs (Apr 4 ~02:30 JST)

Submitted Y(4S) taupair samples (phase 3, BGx1): ~180k 5h jobs (Apr 5 ~01:30 JST)

All scheduled MC9 fabrication jobs have finished (as of April 16, 2018)

First official batch of MC10 jobs submitted - phase 3 Y(4S) mixed samples: BGx1 ~100k 8h jobs and BGx0 ~36k 5h jobs (April 14 ~00:00 JST)

MC10 - phase 3 Y(4S) charged samples: BGx1 ~110k 8h jobs and BGx0 ~38k 5h jobs (April 17 ~01:00 JST)

First (small) batch of MC10 signal jobs submitted (April 20 ~03:30 JST)

MC10 - phase 3 Y(4S) uubar samples: BGx1 ~180k 8h jobs and BGx0 ~64k 5h jobs (April 23 ~05:45 JST)

Second batch of MC10 signal samples submitted (April 23 ~23:00 JST)

Third batch of MC10 signal samples submitted (April 27)

Submitted Phase 2 Y(4S) generic samples (May 1 ~23:30 JST)

Fourth batch of MC10 signal samples (mostly non-4S and/or phase 2) (May 2 ~02:00 JST)

MC10 - phase 3 Y(4S) ddbar samples: BGx1 ~45k 8h jobs and BGx0 ~15k 5h jobs (May 5 ~03:00 JST)

MC10 - phase 3 Y(4S) ssbar samples: BGx1 ~45k 8h jobs and BGx0 ~15k 5h jobs (May 9 ~02:00 JST)

Fifth batch of MC10 signal samples (mostly non-4S and/or phase 2) (May 9 ~03:30 JST)

MC10 - phase 3 Y(4S) ccbar samples: BGx1 ~200k 8h jobs and BGx0 ~50k 5h jobs (May 9 ~02:00 JST)

Submitted a large batch of phase 2 signal samples

MC10 - phase 3 Y(4S) taupair samples: BGx1 ~150k 8h jobs and BGx0 ~37k 5h jobs (May 17 ~02:00 JST)

Small signal MC batch (May 17 ~02:00 JST)

Submitted first batch of BGx2 productions: mixed phase 3 ~75k jobs and phase 2 ~2k jobs (May 25 ~20:30 JST)

Submitted many BGx1 signal MC samples (May 26 ~23:00 JST)

Submitted additional BGx1 signal MC samples (May 27 ~22:00 JST)

Submitted first small batch of BGx2 signal samples (June 9 ~00:00 JST)

MC10 - phase 3 Y(4S) uubar samples: BGx1 ~180k jobs (June 9 ~00:00 JST)

Submitted second batch of BGx2 signal samples (June 9 ~15:00 JST)

Submitted DR3 samples (~10 fb-1): 7k jobs (June 9 ~16:00 JST)

MC10 - phase 3 Y(4S) ddbar samples: BGx1 ~45k jobs (June 10 ~11:00 JST)

Submitted BGx2 signal samples (June 12 starting around ~23:00 JST - will stagger the approval to avoid over-stressing the system)

Submitted ~100 fb-1 phase 3 dress rehearsal jobs (June 15)

MC10 - phase 3 Y(4S) ssbar samples: BGx1 ~45k 8h jobs (June 15 ~3:30 JST)

MC10 - phase 3 Y(4S) ccbar samples: BGx1 ~200k 8h jobs (June 17 ~8:00 JST)

MC10 - phase 3 Y(4S) taupair samples: BGx1 ~150k 8h jobs (June 17 ~8:00 JST)

Submitted a few additional MC10 signal samples before upgrade (July 24 ~22:00 JST)

Submitted a few small MC10 productions (Aug 14 ~23:30 JST) - will submit more if these proceed smoothly

Large generic MC11 productions (release-02-00-01) productions submitted (Aug 16 - current)

Few MC10 signal productions (Sept 13 ~00:00 JST)

Submitting MC11 BGx0 generic samples (Sept 23 ~ 02:00 JST)

Some MC10 signal productions (Sept 26 ~04:00 JST)

Submitting MC11 BGx0 generic samples with early phase 3 geometry (Oct 3 ~07:30 JST)

Submitting new MC11 signal samples (Oct 9 ~13:00 JST)

Submitting MC11 Y(5S) samples (Oct 9 ~21:30 JST)

Many MC11 signal samples submitted (Oct 12 ~00:00 JST)


Most productions have finished. New productions will be started when updated beam background files are ready.

Some small signal MC samples were submitted on Nov 11 ~06:00 JST, Nov 13, Nov 14.

Phase 2 reprocessing with the distributed computing system (prod6b) started Nov 15 ~00:00 JST.

Large phase 3 BGx1 generic MC samples were submitted recently (~Nov 20). These should be relatively short jobs (though there are many of them). They should run for the next few weeks.


Central Services

Dirac (dirac.cc.kek.jp, b2dchsv01-b2dchsv06.cc.kek.jp, b2dchsv08.cc.kek.jp)

  • Date, Issue, Tickets...
  • The memory has rapidly increase at b2dchsv04. 
  • Network downtime
  • 1-min cpu load has rapidly increased and gone over the redline.
  • b2dchsv01-b2dchsv06.cc.kek.jp down. 

DB Production (b2dchdb1.cc.kek.jp, b2dchdb2.cc.kek.jp, b2dcsdb1.cc.kek.jp, b2dcsdb2.cc.kek.jp


DDM (bldirac01.sdcc.bnl.gov)

  • DDM ReplicateAndRegister increasing Queued tasks since 2018-07-17
  • DDM is stalled
  • 2018-03-01 DDM deletion task seems stuck

Conditions DB ()

Monitor

  • Issue in access to DIRAC Web Portal

LFC

File Transfers and Replication Status

See also DDM for related issues

FTS

Any problem in the FTS service or FTS monitoring are to be recorded here. Site/SE specific issues are to be recorded under each SIte/SE

Note that the FTS dashboard we use is an "old" instance and not well-maintained. We, Belle II members in general, do not have access to the "new" monitoring. When the dashboard is down, the shifters just need to notify the expert and skip the corresponding part of their work. The expert should check the new monitoring, for the access to the monitoring page is limited.

  • no matrix image for transfer monitoring  
  • FTS transfer stuck since 2018-11-25 aruond 7:00 UTC 
    GGUS ticket https://ggus.eu/index.php?mode=ticket_info&ticket_id=138490 has submitted 2018-11-26
  • 18/11/08 No activity in "Throughput" and "Successful" plots. 
  • 18/10/12 01:30 UTC Low activity in "Throughput" and "Successful" plots.

Replication Status

  • Almost Zero 'done' at most of the plots with increasing 'scheduled' since 03:00:00 UTC on 2019/01/24

  • 2019-1-19  almost zero done, with a increasing numbers of scheduled jobs for more than 5 SEs and more than 5 hours.
  • Replication and DDM plots are not updated since 2018-11-25 7:00 UTC
    related to
  • 2018-10-11 Sharp drop in 'done' jobs and increase in 'waiting' 
  • 2018-09-29  The numbers have been almost zero. 
  • 2018-09-22  The number of "Done Jobs" is lower than the number of "Scheduled Jobs" during the last 6 hours or more
  • 2018-07-02   No Donetransfer,  several scheduled and rapid increase of Waiting replication

Job Status Plot

  • Many sites show "Job finished with errors" since 16:00 UTC on 2019/02/01. 

  • Almost all sites show "Job finished with errors" since 16:00 UTC on 2019/01/24.  issued.

  • 2019-01-27 Many sites have missing plots.  

Job Summary


SEs

SE Common Issues

  • Issues with individual SEs should be recorded below (Primary SEs or Other SEs).

Raw data SEs

Raw data SE: KEK-RAW-SE (srm://kek2-se02.cc.kek.jp:8444/srm/managerv2?SFN=/belle/RAW)


Raw data SE: BNL-TAPE-SE (srm://dcblsrm.sdcc.bnl.gov:8443/srm/managerv2?SFN=/pnfs/sdcc.bnl.gov/tape)


Primary SEs

Primary SE: BNL-TMP-SE (dcblsrm.sdcc.bnl.gov)

  • SE Health check by DDM : download does not work since 2019-02-15 21:10:34 UTC.
  • SE Health check by DDM : download does not work since 2019-02-15 05:08:20 UTC.
  • -  Scheduled downtime
  • SE Health check by DDM : download does not work since 2019-02-02 02:24:56 UTC. 
    • Upload errors from OSG.BNL.us to BNL-TMP-SE persist. Connections issues could be producing the errors reported at BIIDCO-1594.
  • SE Health check by DDM : download does not work since 2019-01-16 23:43:28 UTC.
  • SE Health check by DDM : download does not work since 2019-01-10 06:59:43 UTC 
      BNL network interruption 2018-Dec-18 14:00-15:00 UTC
  •  SRM_AUTHORIZATION_FAILURE for users
  •  UNAVAILABLE files

Primary SE: CESNET-TMP-SE (dpm1.egee.cesnet.cz) 

  • Free disk space is less than 1TB in CESNET

Primary SE: CNAF-TMP-SE (storm-fe-archive.cr.cnaf.infn.it)

  • 2019/01/27 File transfer failures from CNAF-TMP-SE to NTUCC-DATA-SE.
  • Replication status: Increasing 'Scheduled' with zero 'done' since 2018-12-05 13:00 UTC.
  •  Cotinuous timeout failure between NTU-CC-TMP-SE and CNAF-TMP-SE

Primary SE: DESY-TMP-SE (dcache-se-desy.desy.de)  

  • SE Health check by DDM : download, upload do not work since 2019-01-28 14:30:45 UTC. 

  • SE Health check by DDM : download, upload do not work since 2019-01-27 22:03:46 UTC.
  • SE Health check by DDM : download, upload do not work since 2019-01-26 02:30:22 UTC. 
  • SE Health check by DDM : download, upload do not work since 2019-01-25 22:21:23 UTC.
  • SE Health check by DDM : download, upload do not work since 2018-12-15 21:46:02 UTC 
  • SE Health check by DDM : download, upload do not work since 2018-12-16 07:08:04 UTC.
  • Date, Issue, Tickets...

Primary SE: KEK-DISK-TMP-SE (srm://kek2-se03.cc.kek.jp:8444/srm/managerv2?SFN=/disk/belle/TMP)

  • SE Health check by DDM : download, upload do not work since 2019-01-30 14:57:12 UTC.
  • SE Health check by DDM : download, upload do not work since 2019-01-28 14:34:07 UTC.

Primary SE: KEK2-TMP-SE (srm://kek2-se03.cc.kek.jp:8444/srm/managerv2?SFN=/belle/TMP)

  • SE Health check by DDM : ls does not work since 2019-01-31 11:19:57 UTC.
  • SE Health check by DDM : download, upload do not work since 2019-01-30 14:59:58 UTC. 
  • SE Health check by DDM : download, upload do not work since 2019-01-28 14:36:50 UTC.
  • File Transfer failures : File Transfer Efficiency is too low from SIGNET-TMP-SE to KEK2-TMP-SE since about 2019-01-13 1:00 (UTC) 
  • File Transfer failures : File Transfer Efficiency is too low from KEK2-TMP-SE. since about 2018-12-18 1:00 (UTC)
  • 2018-10-22 FTS transfer and upload failure has observed since 2018-10-21 20:53:36 UTC.
    GGUS ticket https://ggus.eu/index.php?mode=ticket_info&ticket_id=137862 has submitted 2018-10-22 02:59 UTC
  • Firewall issue found in user activities https://ggus.eu/index.php?mode=ticket_info&ticket_id=136643
  • 2018-07-01: Scheduled jobs are increasing from about 05:00 UTC.- FTS job fail with Timeout error GGUS-135874

Primary SE: KISTI-TMP-SE (belle-se-head.sdfarm.kr)

Primary SE: KIT-TMP-SE (dcachesrm-kit.gridka.de)

  • KIT SE giving occasional timeouts 

Primary SE: KMI-TMP-SE (nsrmfe01.hepl.phys.nagoya-u.ac.jp )

  • Replication status: Zero 'done'  with non-zero 'queued' 2018-8-24 06:30 UTC

Primary SE: Napoli-TMP-SE (belle-dpm-01.na.infn.it )

  • SE Health check by DDM : ls does not work since 2019-02-15 22:18:43 UTC.
  • SE Health check by DDM : remove file, remove directory, download, upload, ls do not work since 2019-01-17 12:32:16 UTC  
  • Many transfer failures from 2019-01-09 15:00 (UTC) 
  • SE Health check by DDM : upload does not work since 2018-11-14 19:37:18 UTC.
  • SE Health check by DDM : checksum, download, upload do not work since 2018-09-20 05:24:11 UTC
  • Disk is full

Primary SE: SIGNET-TMP-SE (dcache.ijs.si )

  • File transfer failures or to low: From KMI-TMP-SE and Pisa-TMP-SE to to SINET-TMP-SE on 2019-01-25 (JST). 
  • File Transfer failures : File Transfer Efficiency is too low from SIGNET-TMP-SE to KEK2-TMP-SE since about 2019-01-13 1:00 (UTC) 
  • Frequent transfer failure from SOURCE SIGNET-TMP-SE since 2018-12-25 
    GGUS ticket https://ggus.eu/?mode=ticket_info&ticket_id=138999 has submitted 2018-12-27
  • SE Health check by DDM : checksum, remove file, remove directory, download, upload, ls do not work since 2018-10-31 16:09:44 UTC.

Other SEs

Adelaide-TMP-SE (coepp-dpm-01.ersa.edu.au)

  • Date, Issue, Tickets...

CYFRONET-TMP-SE (dpm.cyf-kr.edu.pl)

  • Date, Issue, Tickets...

CINVESTAV-TMP-SE (jaguar-se.fis.cinvestav.mx)

  • Failed file transfers observed after 7:00 UTC on 22/11/2018, ticket updated
  • Low transfer efficiency observed at 21:00 UTC on 23/10/2018,  put it in the ticket 
  • Low transfer efficiency is observed again after 9:00 UTC on 10/10/18. and on  Ticket updated. 
  • The problem raised by https://agira.desy.de/browse/BIIDCO-1340 seems to have been solved.
  • Low transfer efficiency.
  • FTS authentication crediential error

Frascati-TMP-SE (atlasse.lnf.infn.it)

HEPHY-TMP-SE (hephyse.oeaw.ac.at)

  • Date, Issue, Tickets...

IPHC-TMP-SE (sbgse1.in2p3.fr)

  • Lost files

LAL-TMP-SE (grid05.lal.in2p3.fr)

Melbourne-TMP-SE (b2se.mel.coepp.org.au)

  • transfer rate to be zero

  • Melbourne-DATA-SE banned for write

McGill-TMP-SE  (storm02.clumeq.mcgill.ca)

  • McGill-TMP-SE will be decomissioned in early 2018.

MPPMU-TMP-SE (grid-srm.rzg.mpg.de)

NTU-TMP-SE, NTU-CC-TMP-SE (bgrid3.phys.ntu.edu.tw, belle2grid3.cc.ntu.edu.tw)

  • 2019/01/27 File transfer failures from CNAF-TMP-SE to NTUCC-DATA-SE.
  • File transfer failure and cancellation to NTUCC-DATA-SE happened 2018-12-22
  • Frequent timtout has observed between NTU-CC-TMP-SE and CNAF-TMP-SE
    GGUS ticket https://ggus.eu/index.php?mode=ticket_info&ticket_id=137334 has submitted 2018-09-22 05:10 UTC
  • NTUCC-TMP-SE banned for write 

Pisa-TMP-SE (stormfe1.pi.infn.it)

PNNL-TMP-SE (se.hep.pnnl.gov) 

  • Being decommissioned. No need to report any issues.
    •  

Roma3-TMP-SE (storm-01.roma3.infn.it)

  •  Date, Issue, Tickets...

TAU-TMP-SE (tau-se.hep.tau.ac.il)

Torino-TMP-SE (se-srm-00.to.infn.it)

ULAKBIM-TMP-SE (torik1.ulakbim.gov.tr)

UMiss-TMP-SE (umiss005.hep.olemiss.edu)


UVic-TMP-SE(charon01.westgrid.ca)

  • File Transfer failures : File Transfer Efficiency is too low from UVic-DATA-SE. since about 2018-12-18 1:00 (UTC) 

  • File transfer failures from Source UVic-TMP-SE observed on 26 Oct 2018
  • FTS connection timeout from Uvic to KEK (kek2-se03.cc.kek.jp)
    Solved and verified at 2018-10-25 : GGUS ticket https://ggus.eu/index.php?mode=ticket_info&ticket_id=137332 has submitted 2018-09-22 04:28 UTC


Sites

Sites Common Issue

  • Several sites shows Short pilot  job since 18:20:00 UTC on 2019/01/24 
  • DIRAC SSH sites does not filled jobs at MC11
  • Several sites: Pilot submission failures/short pilots (4x) and Software could not be installed (2x) - Common JIRA ticket issued:
    → See below info for individual sites referring to the same JIRA ticket (BIIDCO-1443)
  • Several sites: Short Pilot Jobs (17x).

ARC.DESY.de

  • Health checker info. : "Short pilot jobs" has been found since 13:20:00 UTC on 2019/01/25.
  • Health checker info. : "Short pilot jobs" has been found at 13:20:00 UTC on 2018/12/29.
  • Health checker info. : "Short pilot jobs" has been found at 12:20:00 UTC on 2018/12/28.
  • Health checker info. : "Short pilot jobs" has been found at 14:20:00 UTC on 2018/12/27.
  • Job status check: Input Data Resolution issues (still 100% of the jobs) on 2018/12/22 at 8:00 UTC.
  • Job status check: Input Data Resolution issues (100% of the jobs) on 2018/12/21 at 8:48 UTC.
  • Health checker info. : "Short pilot jobs" has been found since 08:20:00 UTC on 2018/12/16.(details)
  • all jobs are in "Input data resolution" status since 17:00:00 UTC on 2018/12/18.
  • 100% Jobs fail at ARC.DESY.DE 
  • Health checker info. : "Short pilot jobs" has been found at 14:20:00 UTC on 2018/12/13.(details). 
  • Job submission check : Pilot submission failure has been found since 05:42:00 UTC on 2018/10/25. (details)
  • Health checker info. : "Aborted pilot jobs" has been found since 16:20:00 UTC on 2018/10/23.
  •  Reconfiguration for the site queues:

ARC.DESY-test.de

  • A test queue for the new CE.

ARC.KIT.de

  • Health checker info. : "Short pilot jobs" has been found since 08:20:00 UTC on 2019/02/14.
  • Health checker info. : "Short pilot jobs" has been found since 06:20:00 UTC on 2019/01/31.
  • Health checker info. : "Short pilot jobs" has been found since 21:20:00 UTC on 2019/01/30.(details)
  • Health checker info. : "Short pilot jobs" has been found at 23:20:00 UTC on 2019/01/29.(details)
  • Health checker info. : "Short pilot jobs" has been found since 05:20:00 UTC on 2019/01/26
  • Health checker info. : "Short pilot jobs" has been found since 09:20:00 UTC on 2019/01/02
  • Health checker info. : "Short pilot jobs" has been found at 14:20:00 UTC on 2018/12/29. 
  • Health checker info. : "Short pilot jobs" has been found since 12:20:00 UTC on 2018/12/28.
  • Job status check: Application finished with errors (5% of the jobs in last 24 hours) on 2018/12/21 at 8:48 UTC.
  • Health checker info. : "Short pilot jobs" has been found at 07:20:00 UTC on 2018/11/19.
  • Health checker info. : "Failed pilot jobs" has been found since 20:20:00 UTC on 2018/10/20
  • Health checker info. : "Short pilot jobs" has been found since 17:20:00 UTC on 2018/10/06.(details)

ARC.LMU.de

  • This is a test site. Do not need to report any issue.

ARC.LMU2.de

  • 2018/08/13 Downtime: Start time: 2018-08-12 13:00 (UTC) End time: 2018-08-13 15:00 (UTC) 

  • Banned as currently no resource behind the CE

ARC.Melbourne.au

  • Scheduled downtime (Start time: 2019-02-02 22:00 (UTC) End time: 2019-02-03 04:00 (UTC) BIIDCO-1661

ARC.MPPMU.de

  • Job submission check : Pilot submission failure has been found since 00:22:00 UTC on 2019/01/26
  • Health checker info. : "Failed pilot jobs" has been found at 09:20:00 UTC on 2018/10/25.(details)
  • Job submission check : Pilot submission failure has been found since 13:26:00 UTC on 2018/10/21.

ARC.SIGNET.si

  • Health checker info. : "Short pilot jobs" has been found at 04:20:00 UTC on 2019/01/31.
  • Health checker info. : "Short pilot jobs" has been found at 14:20:00 UTC on 2019/01/25.
  • Health checker info. : "Short pilot jobs" has been found since 01:20:00 UTC on 2019/01/22. Open GGUS ticket GGUS-139280
  • Health checker info. : "Short pilot jobs" has been found since 07:20:00 UTC on 2019/01/17, last for more than 5 hours.  
  • Health checker info. : "Short pilot jobs" has been found since 14:20:00 UTC on 2019/01/13.
  • Health checker info. : "Belle II software could not be installed on " has been found since 09:20:00 UTC on 2018/12/29.
  • Job status check: Application finished with errors (5% of the jobs) at 11:15 UTC on 2018/12/21.
  • Health checker info. : "Short pilot jobs" has been found since 17:20:00 UTC on 2018/12/20.(details)
  • Job submission check : Pilot submission failure has been found at 13:26:00 UTC on 2018/12/19.

  • "Failed to install DIRAC on " has been found since 20:20:00 UTC on 2018/11/03.

  • "Short pilot jobs" has been found at 14:20:00 UTC on 2018/10/29.
  • Health che cker info. : "Aborted pilot jobs" has been found since 20:20:00 UTC on 2018/10/20.
  • Health checker info. : "Failed pilot jobs" has been found since 06:20:00 UTC on 2018/10/03.(details)
  • Health checker info. : "Belle II software could not be installed on " has been found since 17:20:00 UTC on 2018/09/23 
  • Health checker info. : "Short pilot jobs" has been found at 22:20:00 UTC on 2018/09/14.(details)
  • Job submission check : Pilot submission failure has been found at 05:29:00 UTC on 2018/09/15. (details)
  • Job submission check : Pilot submission failure has been found since 01:31:00 UTC on 2018/09/08. (details)

CLOUD.CC1_Krakow.pl

  • Not used in production yet. Seeing no jobs (no plot) is not a problem

DIRAC.Beihang.cn

  • Health checker info. : "Short pilot jobs" has been found at 06:20:00 UTC on 2018/12/08.
  • Job status check: "application finished with errors" (100% currently) on 2018/10/26.
  • Job submission check : Pilot submission failure has been found since 09:24:00 UTC on 2018/09/21. (details)
  • Many MCProduction jobs failed at file upload stage for fail-over SEs 2017-12-24
  • The number of jobs limited.
  • All the upload trials are failing against all the SEs configured: OutputSE (KMI-TMP-SE, PNNL-TMP-SE), Fail-over SEs(DESY-TMP-SE, Napoli-TMP-SE, PNNL-TMP-SE, KIT-TMP-SE)
  • Large % of failed jobs in DIRAC status plot (Added 2016-11-03 22:45:00 UTC) 

DIRAC.BINP.ru

  • Job status check: Application finished with errors (27% of the jobs over the last 24h) at 8:00 UTC on 2018/12/22.
  • Job submission check : Pilot submission failure has been found since 17:26:00 UTC on 2018/10/21.
  • Health checker info. : "Failed to install DIRAC on " has been found at 22:20:00 UTC on 2018/09/15

DIRAC.BINP-VM.ru

  • Job submission check : Pilot submission failure has been found since 10:23:00 UTC on 2019/01/14.
  • Job status check: Application finished with errors (99.8% over the last 12h) at 7:41 UTC on 2019/01/09
  • Job status check: Application finished with errors (34% of the jobs in last 24 hours) and Stalled (8%) on 2018/12/21 at 8:48 UTC.
  • Job status check: "Application Finished with Errors " (episodically, 10% in total), on 2018/12/20.
  • Job Status Plots "Application Finished with Errors " (100 %) on 9/10/18.
  • Job status plots, "Application Finished With Errors" (2018-02-11 but lasting for at least a month)

DIRAC.CINVESTAV.mx

  • Health checker info. : "Short pilot jobs" has been found at 15:20:00 UTC on 2018/12/06. 
  • Job status plots, "Application Finished With Errors" & "Watchdog identified this job as Stalled" (2018-02-12)

DIRAC.DESY.de

  • Test site. Not in use in MC production

DIRAC.IITG.in

  • Health checker info. : "Aborted pilot jobs" has been found at 04:20:00 UTC on 2019/01/28.(details)
  • Health checker info. : "Aborted pilot jobs" has been found since 21:20:00 UTC on 2018/12/22.
  • Job status check: Application finished with errors (95% of the jobs over the last 24h) at 8:00 UTC on 2018/12/22.
  • Health checker info. : "Short pilot jobs" has been found since 03:20:00 UTC on 2018/12/22.(details)
  • Job status check: Application finished with errors (100% of the jobs) since 6:00 UTC on 2018/12/21.
  • Job status check: Input Data Resolution issues (100% of the jobs) on 2018/12/21 at 8:48 UTC.
  • Health checker info. : "Short pilot jobs" has been found at 06:20:00 UTC on 2018/12/21.(details)
  • all jobs are in "Input data resolution" status (2018/12/20).
  • 100% Jobs fail at DIRAC.IITG.in
  • Health checker info. : "Aborted pilot jobs" has been found at 14:20:00 UTC on 2018/12/15.
  • Health checker info. : "Short pilot jobs" has been found at 14:20:00 UTC on 2018/12/14.
  • Job submission check : Pilot submission failure has been found at 22:23:00 UTC on 2018/12/05.
  • Health checker info. : "Short pilot jobs" has been found since 17:20:00 UTC on 2018/11/02.
  • Job submission check : Pilot submission failure has been found at 13:24:00 UTC on 2018/09/26. (details)
  • Health checker info. : "Aborted pilot jobs" has been found since 14:20:00 UTC on 2018/04/22 

DIRAC.IITH.in

  • Health checker info. : "Short pilot jobs" has been found at 07:20:00 UTC on 2018/01/28.
  • Health checker info. : "Short pilot jobs" has been found since 08:20:00 UTC on 2019/01/03. 
  • Health checker info. : "Short pilot jobs" has been found at 14:20:00 UTC on 2018/12/14.
  • Job status check: "input Data Resolution" issues (36%) on 2018/10/26.
  • Job submission check : Pilot submission failure has been found since 11:27:00 UTC on 2018/10/06. (details)
  • Job submission check : Pilot submission failure has been found since 11:28:00 UTC on 2018/10/03. 

DIRAC.LMU.de

  • Not in use in MC production
  • Banned for now.

DIRAC.MIPT.ru

  • Health checker info. : "Short pilot jobs" has been found at 14:20:00 UTC on 2019/01/25.
  • 100% Jobs fail
  • Health checker info. : "Belle II software could not be installed on " has been found since 08:20:00 UTC on 2018/12/14.
  • Health checker info. : "Belle II software could not be installed on " has been found at 22:20:00 UTC on 2018/12/05.
  • Health checker info. : "Belle II software could not be installed on " has been found since 01:20:00 UTC on 2018/11/24.
  • Health checker info. : "Short pilot jobs" has been found since 21:20:00 UTC on 2018/11/23.(details)
  • Health checker info. : "Belle II software could not be installed on " has been found since 21:20:00 UTC on 2018/11/21.
  • Health checker info. : "Belle II software could not be installed on " has been found at 14:20:00 UTC on 2018/11/20.
  • Job Status Plots "Application Finished with Errors " (100 %) on 9/10/18.https://agira.desy.de/browse/BIIDCO-1358
  • Health checker info. : "Aborted pilot jobs" has been found at 14:20:00 UTC on 2018/10/06.(details)
  • Health checker info. : "Aborted pilot jobs" has been found at 07:20:00 UTC on 2018/09/21.(details)
  • Health checker info. : "Short pilot jobs" has been found at 20:20:00 UTC on 2018/09/13.(details)
  • Health checker info. : "Short pilot jobs" has been found at 15:20:00 UTC on 2018/07/23.(details)
  • Health checker info. : "Short pilot jobs" has been found at 04:20:00 UTC on 2018/07/19.
  • Health checker info. : "Aborted pilot jobs" has been found at 22:20:00 UTC on 2018/07/18.
  • Job status plots, "Application Finished With Errors" has been found at about 04:00:00 JST on 2018/07/06. (details)
  • Health checker info. : "Aborted pilot jobs" has been found at 12:20:00 UTC on 2018/02/11

DIRAC.Nagoya.jp

  • Health checker info. : "Short pilot jobs" has been found since 22:20:00 UTC on 2019/01/27.(details)
  • Health checker info. : "Short pilot jobs" has been found since 18:20:00 UTC on 2019/01/11.
  • Job status check: Application finished with errors (5.6% of the jobs in last 24 hours) on 2018/12/21 at 8:48 UTC.
  • Health checker info. : "Short pilot jobs" has been found at 07:20:00 UTC on 2018/12/21.(details)
  • Health checker info. : "Short pilot jobs" has been found at 14:20:00 UTC on 2018/12/13.(details)
  • Health checker info. : "Short pilot jobs" has been found since 19:20:00 UTC on 2018/11/15.
  • Health checker info. : "Short pilot jobs" has been found at 14:20:00 UTC on 2018/10/21.
  • Health checker info. : "Short pilot jobs" has been found since 19:20:00 UTC on 2018/09/30.(details)
  • Health checker info. : "Short pilot jobs" has been found since 12:20:00 UTC on 2018/08/17.(details)

DIRAC.Nara-WU.jp

  • Job submission check : Pilot submission failure has been found at 04:22:00 UTC on 2019/01/11.
  • Job submission check : Pilot submission failure has been found since 09:21:00 UTC on 2018/12/30. 
  • Job status check: Application finished with errors (11% of the jobs in last 24 hours) on 2018/12/21 at 8:48 UTC.
  • Under commisioning from 2018-11-13

DIRAC.NDU.jp

  • Health checker info. : Short pilot jobs has been found at 22:20:00 UTC on 2018/12/12

DIRAC.Niigata.jp

  • Health checker info. : "Short pilot jobs" has been found since 01:20:00 UTC on 2019/01/31.(details)
  • Health checker info. : "Short pilot jobs" has been found at 23:20:00 UTC on 2019/01/10.
  • down from 22:00JST on 21st Dec. to 08:00JST on 22nd Dec
    due to the network system replacement.
  • Health checker info. : Short pilot jobs has been found since 17:20:00 UTC on 2018/12/12.
  • Job submission check : Pilot submission failure has been found since 12:36:00 UTC on 2018/10/17. 
  • Health checker info. : "Aborted pilot jobs" has been found at 14:20:00 UTC on 2018/10/06.(details)

DIRAC.Osaka-CU.jp

  • Job submission check : Pilot submission failure has been found since 07:23:00 UTC on 2018/12/04. 
  • Job submission check : Pilot submission failure has been found since 07:23:00 UTC on 2018/12/04. 
  • Pilot submission failure has been found since 18:32:00 UTC on 2018/11/24 
  • Health checker info. : "Short pilot jobs" has been found since 22:20:00 UTC on 2018/03/17.
    → Ask site admin to check the status 2018-03-17 10:00 JST. (DB access failure again from DIRAC.Osaka-CU.jp to PNNL from 2018-03-16 11:00 UTC)
  •  MCProduction = 5

DIRAC.PNNL.us

  • Site to be decommissioned

DIRAC.PNNL2.us

  • Site to be decommissioned

DIRAC.PNNL-CASCADE.us

  • Seeing no jobs (no plot) is not a problem

DIRAC.PNNL-PIC.us

  • Seeing no jobs (no plot) is not a problem

DIRAC.RCNP.jp

  • Health checker info. : "Not enough disk space on " has been found since 21:20:00 UTC on 2019/01/29. 
  • Health checker info. : "Not enough disk space on " has been found at 07:20:00 UTC on 2019/01/28.
  • Health checker info. : "Not enough disk space on " has been found at 06:20:00 UTC on 2019/01/26.
  • Health checker info. : "Not enough disk space on " has been found at 14:20:00 UTC on 2019/01/25.
  • Health checker info. : "Short pilot jobs" has been found since 06:20:00 UTC on 2018/12/14.
  • Health checker info. : "Not enough disk space on " has been found since 05:20:00 UTC on 2018/10/25.
  • Job Status : "Job has exceeded wall clock time" : Pink Colour : (100%) on 9/10/18.
  • Health checker info. : "Aborted pilot jobs" has been found since 12:20:00 UTC on 2018/09/08.(details)

DIRAC.SSU.kr

  • Health checker info. : "Short pilot jobs" has been found since 17:20:00 UTC on 2018/11/01.

DIRAC.TIFR.in

  • Job submission check : Pilot submission failure has been found since 07:22:00 UTC on 2019/01/01. 
  • Job submission check : Pilot submission failure has been found since 07:22:00 UTC on 2019/01/01. 
  • Health checker info. : "Short pilot jobs" has been found since 13:20:00 UTC on 2018/10/22.
  • Job status plots, "Application Finished With Errors" has been found at about 00:00:00 JST on 2018/07/06. (details)
  • Health checker info. : "Short pilot jobs" -- Already reported: 
  •  RunningLimit is set for MCProduction=1
  • Job stalled at input data resolution

DIRAC.TMU.jp

  • Health checker info. : "Short pilot jobs" has been found since 01:20:00 UTC on 2019/01/31.(details)
  • Health checker info. : "Short pilot jobs" has been found since 10:20:00 UTC on 2018/12/14.
  • Health checker info. : "Short pilot jobs" has been found since 10:20:00 UTC on 2018/11/02
  • Job status check: "Application finished with errors" (60%) on 2018/10/26.
  • Health checker info. : "Belle II software could not be installed on " has been found since 18:20:00 UTC on 2018/10/17.
  • Health checker info. : "Short pilot jobs" has been found since 01:20:00 UTC on 2018/10/15. 
  • Job submission check : Pilot submission failure has been found since 10:24:00 UTC on 2018/09/30. (details)

DIRAC.Tokyo.jp

  • Date, Issue, Tickets..

DIRAC.UAS.mx

  • Job submission check: 100% failed with errors from 22:00 2019/01/08 till 04:00 2019/01/09 (UTC)
  • 100% of jobs fails with errors
  • Health checker info. : "Belle II software could not be installed on " has been found since 04:20:00 UTC on 2018/12/17. 
  • Health checker info. : "Belle II software could not be installed on " has been found since 16:20:00 UTC on 2018/11/14.
  • Job submission check : Pilot submission failure has been found since 01:26:00 UTC on 2018/09/21. (details)
  • Health checker info. : "Short pilot jobs" has been found since 17:20:00 UTC on 2018/09/17 Added to
  • Job submission check : Pilot submission failure has been found since 15:24:00 UTC on 2018/09/16 (emailed comp-dc-operations, create JIRA ticket when able)

DIRAC.UVic.ca

  • Health checker info. : "Short pilot jobs" has been found at 23:20:00 UTC on 2019/01/29.(details)
  • Health checker info. : "Short pilot jobs" has been found at 14:20:00 UTC on 2019/01/28.
  • Health checker info. : "Short pilot jobs" has been found at 06:20:00 UTC on 2019/01/28.(details)
  • GGUS ticket : "CA-VICTORIA-WESTGRID-T2 : FTS connection timeout to srm://kek2-se03.cc.kek.jp"(137332) has been submited at 04:28:09 UTC on 2018/09/22.
  • Health checker info. : "Short pilot jobs" has been found since 20:20:00 UTC on 2018/10/07.(details)
  • Health checker info. : "Short pilot jobs" has been found since 08:20:00 UTC on 2018/08/16.(details)

DIRAC.UVic-local.ca

  • Health checker info. : "Short pilot jobs" has been found at 14:20:00 UTC on 2019/01/28.
  • Health checker info. : "Short pilot jobs" has been found at 06:20:00 UTC on 2019/01/28.(details)
  • Health checker info. : "Short pilot jobs" has been found since 05:20:00 UTC on 2018/09/29.(details)

DIRAC.Yamagata.jp

  • Health checker info. : "Short pilot jobs" has been found since 01:20:00 UTC on 2019/01/31.(details)
  • high ratio of jobs finished with error (from job status) 2019/01/29 Tue 03:00 UTC 
  • Job status check: Application finished with errors (13% of the jobs at 11:15 UTC, but 100% in the last hours) on 2018/12/21.
  • Health checker info. : "Short pilot jobs" has been found since 04:20:00 UTC on 2018/12/12.(details)
  • Job submission check : Pilot submission failure has been found since 01:27:00 UTC on 2018/09/16. (details)
  • Health checker info. : "Short pilot jobs" has been found since 15:20:00 UTC on 2018/05/21.(details)

DIRAC.Yonsei.kr 

  • Job submission check : Pilot submission failure has been found since 15:24:00 UTC on 2019/01/29.
  • Health checker info. : "Short pilot jobs" has been found at 14:20:00 UTC on 2018/12/28.
  • Health checker info. : "Short pilot jobs" has been found at 06:20:00 UTC on 2018/12/08. 

LCG.CESNET.cz

  • Health checker info. : "Belle II software could not be installed on skurut27.grid.cesnet.cz" has been found since 23:20:00 UTC on 2019/02/01.

  • Job submission check : Pilot submission failure has been found at 23:27:00 UTC on 2019/02/01. 

  • Job submission check : Pilot submission failure has been found at 22:23:00 UTC on 2019/01/28. (details)
  • Health checker info. : "BLAH ERROR" has been found since 16:20:00 UTC on 2019/01/26.
  • Health checker info. : "Short pilot jobs" has been found since 18:20:00 UTC on 2018/11/02.

  • Job submission check : Pilot submission failure has been found since 19:40:00 UTC on 2018/05/23. (details)
  •   Need some intervention to run Merge jobs
  • Job submission check : Pilot submission failure has been found since 17:34:00 UTC on 2018/05/16. (details)

LCG.CNAF.it

  • Health checker info. : "Failed pilot jobs" has been found at 00:20:00 UTC on 2019/02/02.
  • Health checker info. : "Aborted pilot jobs" has been found since 04:20:00 UTC on 2018/12/13
  • Short pilot jobs" has been found since 21:20:00 UTC on 2018/11/21

LCG.Cosenza.it

  • Health checker info. : "Short pilot jobs" has been found since 12:20:00 UTC on 2018/10/30.
  • Downtime 2018-10-25 13:00 (UTC) - 2018-10-26 13:00 (UTC)    and  2018-10-23 13:00 (UTC) - 2018-10-25 13:00 (UTC)
  • Health checker info. : "Short pilot jobs" has been found since 16:20:00 UTC on 2018/10/08.(details)  

LCG.CYFRONET.pl

  • high ratio of jobs finished with error (from job status)2019/01/29 Tue 03:00 UTC 
  • Health checker info. : "Short pilot jobs" has been found at 07:20:00 UTC on 2019/01/28
  • Health checker info. : "Short pilot jobs" has been found at 00:20:00 UTC on 2019/01/28.(details)
  • Health checker info. : "Aborted pilot jobs" has been found since 22:20:00 UTC on 2019/01/27.(details)
  • Health checker info. : "Short pilot jobs" has been found since 03:20:00 UTC on 2019/01/27.

  • Health checker info. : "Failed to install DIRAC on n1063-amd" has been found at 22:20:00 UTC on 2019/01/26. 

  • Job status check: Stalled (9% of the jobs in last 24 hours) on 2018/12/21 at 8:48 UTC.
  • Health checker info. : "Short pilot jobs" has been found since 13:20:00 UTC on 2018/12/13.(details)
  • Health checker info. : "Short pilot jobs" has been found at 14:20:00 UTC on 2018/10/21.
  • Health checker info. : "Aborted pilot jobs" has been found at 18:20:00 UTC on 2018/09/13.(details)
  • Downtime (Decommissioning cream.grid.cyf-kr.edu.pl and cream02.grid.cyf-kr.edu.pl): Start time: 2018-02-27 23:00 (UTC), End time: 2018-12-31 00:00 (UTC) 

LCG.DESY.de

  • The site to be retired   – No more jobs to be submitted.
  • Downtime, Start time: 2018-09-01 00:00 (UTC), End time: 2018-09-30 23:59 (UTC) 

LCG.Frascati.it

  • high ratio of jobs finished with error (from job status)2019/01/29 Tue 03:00 UTC 
  • Health checker info. : "Failed pilot jobs" has been found at 14:20:00 UTC on 2018/10/21.
  • Health checker info. : "Short pilot jobs" has been found since 12:20:00 UTC on 2018/07/10.(details
  • Health checker info. : "Short pilot jobs" has been found since 06:20:00 UTC on 2018/06/30.(details)

LCG.HEPHY.at

  • Health checker info. : "Failed pilot jobs" has been found at 02:20:00 UTC on 2019/01/31.(details)
  • Health checker info. : "Failed pilot jobs" has been found at 02:20:00 UTC on 2019/01/30.(details)
  • submission check : Pilot submission failure has been found at 14:22:00 UTC on 2018/12/27.
  • Health checker info. : Short pilot jobs has been found since 21:20:00 UTC on 2018/12/12
  • Health checker info. : "BLAH ERROR" has been found since 13:20:00 UTC on 2018/10/09.(details)
  • Job submission check : Pilot submission failure has been found since 16:25:00 UTC on 2018/06/21. (details)

  •  MCProduction = 680

LCG.IPHC.fr.

  • Downtime: 2019-01-21 08:00 - 2019-01-24 16:00
  • Health checker info. : "Failed pilot jobs" has been found at 00:20:00 UTC on 2018/06/18.(details)

LCG.KEK.jp

  • Health checker info. : "Belle II software could not be installed on cb046.cc.kek.jp" has been found since 19:20:00 UTC on 2019/02/14.
  • high ratio of jobs finished with error (from job status) 2019/01/29 Tue 03:00 UTC 
  • Job submission check : Pilot submission failure has been found since 05:25:00 UTC on 2018/12/20.
  • Health checker info. : "Short pilot jobs" has been found since 17:20:00 UTC on 2018/11/12.(details

  • Health checker info. : "Failed pilot jobs" has been found since 06:20:00 UTC on 2018/10/10.(details)
  • Health checker info. : "Belle II software could not be installed on cb512.cc.kek.jp" has been found since 21:20:00 UTC on 2018/10/01.
  • Health checker info. : "Failed pilot jobs" has been found at 22:20:00 UTC on 2018/09/24.(details)
  • Performance degraded with "Input data resolution" status since 2018-07-24 around 20:00 UTC

LCG.KEK2.jp

  • Health checker info. : "Short pilot jobs" has been found since 07:20:00 UTC on 2019/01/30.
  • Health checker info. : "Short pilot jobs" has been found at 23:20:00 UTC on 2019/01/29.(details)
  • Health checker info. : "Short pilot jobs" has been found since 06:20:00 UTC on 2019/01/29.
  • high ratio of jobs finished with error (from job status) 2019/01/29 Tue 03:00 UTC 
  • Health checker info. : "Short pilot jobs" has been found since 10:20:00 UTC on 2019/01/28.
  • Health checker info. : "Failed pilot jobs" has been found at 06:20:00 UTC on 2019/01/28.(details)
  • Health checker info. : "Failed pilot jobs" has been found at 00:20:00 UTC on 2019/01/28.(details)
  • Health checker info. : "Short pilot jobs" has been found since 19:20:00 UTC on 2019/01/11.
  • Health checker info. : "Short pilot jobs" has been found at 23:20:00 UTC on 2019/01/10.
  • Health checker info. : "Failed pilot jobs" has been found since 13:20:00 UTC on 2018/12/21.
  • Job submission check : Pilot submission failure has been found at 06:25:00 UTC on 2018/12/20.
  • all jobs are in "Input data resolution" status since 12.00 2018/12/18 UTC
  • Health checker info. : "Short pilot jobs" has been found since 21:20:00 UTC on 2018/11/23.(details)
  • Health checker info. : "Short pilot jobs" has been found since 19:20:00 UTC on 2018/11/22.(details)
  • Health checker info. : "Failed pilot jobs" has been found since 06:20:00 UTC on 2018/10/10.(details)
  • Health checker info. : "Failed pilot jobs" has been found at 06:20:00 UTC on 2018/09/28.(details)

LCG.KISTI.kr

  • Jobs slots are disabled for SE maintenace from 2018-10-19 to 2018-10-23
  • Health checker info. : "BLAH ERROR" has been found since 06:20:00 UTC on 2018/10/19.(details)

  • "Short pilot jobs" has been found at 06:20:00 UTC on 2018/10/09.(details)
  • BLAH error seems to be happen if jobs exceed the allocated # of queues, not a problem (Site specific feature)  
  • MCProduction= 280
  • A large number of Merge jobs in waiting status

LCG.KMI.jp

  • Job submission check : Pilot submission failure has been found since 21:25:00 UTC on 2019/02/01. 

  • Health checker info. : "Short pilot jobs" has been found since 22:20:00 UTC on 2019/01/27.(details)
  • Job status check: Application finished with errors (7% of the jobs in last 24 hours) on 2018/12/21 at 8:48 UTC.
  • Health checker info. : "Short pilot jobs" has been found at 14:20:00 UTC on 2018/12/13.(details)
  • Health checker info. : "Belle II software could not be installed on pwn22.local" has been found since 21:20:00 UTC on 2018/11/22.
  • Health checker info. : "Belle II software could not be installed on pwn22.local" has been found since 05:20:00 UTC on 2018/11/22.
  • Job submission check : Pilot submission failure has been found since 21:24:00 UTC on 2018/10/02. (details)

LCG.LAL.fr

  • Downtime: Start downtime: 2019-02-12 06:00 -- End downtime: 2019-02-12 10:00 

Site under commissioning. Issues to be reported.

LCG.Legnaro.it

  • Downtime: 2018-10-16 06:30 2018-10-16 17:00 SE Software update 
  • Health checker info. : "Failed pilot jobs" has been found since 13:20:00 UTC on 2018/03/16.(details)
  • Downtime: Start downtime: 2018-10-29 07:00 -- End downtime: 2018-10-29 11:00

LCG.Napoli.it

  • Job submission check : Pilot submission failure has been found since 07:21:00 UTC on 2018/11/10. 
  • Health checker info. : "Short pilot jobs" has been found at 22:20:00 UTC on 2018/11/04 
  • Health checker info. : "Short pilot jobs" has been found at 06:20:00 UTC on 2018/10/09.(details)

  • Health checker info. : "Failed pilot jobs" has been found at 14:20:00 UTC on 2018/10/06.(details)
  • This site is in down time schedule from 2018-10-02 16:00 (UTC)  to 2018-10-08 18:00 (UTC) 
  • Health checker info. : "Failed pilot jobs" has been found since 16:20:00 UTC on 2018/09/27.(details)
  • Health checker info. : "Failed pilot jobs" has been found since 08:20:00 UTC on 2018/09/19.(details)
  • Job submission check : Pilot submission failure has been found at 05:32:00 UTC on 2018/09/11. (details
  • Job submission check : Pilot submission failure has been found since 01:21:00 UTC on 2018/11/07.
  • Stalled jobs
  • "Failed pilot jobs" has been found at 14:20:00 UTC on 2018/03/17.

LCG.NTU.tw

  • GGUS ticket : "CRL expiration at belle2grid2.cc.ntu.edu.tw"(139674) has been submited at 22:05:29 UTC on 2019/02/13.
  • Health checker info. : "CRL has expired" has been found since 21:20:00 UTC on 2019/02/11.
  • Health checker info. : "CRL has expired" has been found since 11:20:00 UTC on 2019/01/14.
  • Job submission check : Pilot submission failure has been found since 06:22:00 UTC on 2019/01/11. 
    • GGUS ticket 139598 : "Job submission failure at belle2grid2.cc.ntu.edu.tw"
  • Health checker info. : "BLAH ERROR" has been found since 09:20:00 UTC on 2019/01/09. https://agira.desy.de/browse/BIIDCO-1601  

LCG.Pisa.it

  • Health checker info. : "Short pilot jobs" has been found at 05:20:00 UTC on 2019/02/15
  • Health checker info. : "Short pilot jobs" has been found at 00:20:00 UTC on 2019/02/02. 
  • Health checker info. : "Short pilot jobs" has been found since 06:20:00 UTC on 2019/01/31
  • Job submission check : Pilot submission failure has been found at 03:23:00 UTC on 2019/01/31
  • Health checker info. : "Short pilot jobs" has been found since 01:20:00 UTC on 2019/01/31.(details)
  • Job submission check : Pilot submission failure has been found at 22:26:00 UTC on 2019/01/30. (details)
  • Job submission check : Pilot submission failure has been found since 06:24:00 UTC on 2019/01/30.
  • Health checker info. : "Short pilot jobs" has been found at 14:20:00 UTC on 2019/01/29.
  • Health checker info. : "Short pilot jobs" has been found since 05:20:00 UTC on 2019/01/29.
  • Health checker info. : "Short pilot jobs" has been found since 21:20:00 UTC on 2019/01/28.(details)
  • Health checker info. : "Short pilot jobs" has been found at 14:20:00 UTC on 2019/01/28.
  • Health checker info. : "Short pilot jobs" has been found since 07:20:00 UTC on 2019/01/28.
  • Health checker info. : "Short pilot jobs" has been found since 05:20:00 UTC on 2019/01/28.(details)
  • Health checker info. : "Short pilot jobs" has been found at 04:20:00 UTC on 2019/01/28.(details)
  • Health checker info. : "Short pilot jobs" has been found at 23:20:00 UTC on 2019/01/27.(details)
  • Health checker info. : "Short pilot jobs" has been found at 05:20:00 UTC on 2019/01/25 
  • Health checker info. : "Short pilot jobs" has been found since 17:20:00 UTC on 2019/01/24.
  • Health checker info. : "Short pilot jobs" has been found since 18:20:00 UTC on 2019/01/22.
  • Health checker info. :"Short pilot jobs" has been found since 10:20:00 UTC on 2019/01/21.

  • "Short pilot jobs" has been found since 18:20:00 UTC on 2019/01/20. 

  • "Failed pilot jobs" has been found since 14:20:00 UTC on 2019/01/18 , last for more than 5 horus. 
  • Downtime : Enabling IPv6 on SRM 
  • Health checker info. : "Short pilot jobs" has been found since 16:20:00 UTC on 2019/01/15.
  • Health checker info. : "Failed to install DIRAC on ne1wn4.pi.infn.it,ne1wn6.pi.infn.it" has been found since 12:20:00 UTC on 2019/01/11.
  • Health checker info. :
    "Failed to install DIRAC on ne1wn4.pi.infn.it,so1wn5.pi.infn.it" has been found at 05:20:00 UTC on 2019/01/11. 
    "Short pilot jobs" has been found at 05:20:00 UTC on 2019/01/11.
    "Not enough disk space on so1wn5.pi.infn.it" has been found at 05:20:00 UTC on 2019/01/11.

  • Health checker info. : "Failed to install DIRAC on s1wn37.pi.infn.it,n2wn16.pi.infn.it" has been found since 16:20:00 UTC on 2019/01/10.
  • Health checker info. : "Failed to install DIRAC on so1wn8.pi.infn.it,n2wn13.pi.infn.it,n2wn18.pi.infn.it,so1wn6" has been found since 01:20:00 UTC on 2019/01/02.
  • Health checker info. :
    1. "Failed pilot jobs" has been found since 17:20:00 UTC on 2018/12/28.
    2. "Short pilot jobs" has been found since 22:20:00 UTC on 2018/12/28.
  • Health checker info. :
    1. "Failed to install DIRAC on ne1wn5.pi.infn.it,ne1wn7.pi.infn.it" has been found at 11:20:00 UTC on 2018/12/28.
    2. "Short pilot jobs" has been found since 12:20:00 UTC on 2018/12/28.
  • Health checker info. : "Short pilot jobs" has been found since 02:20:00 UTC on 2018/12/22.(details)
  • Job status check: Application finished with errors (5% of the jobs in last 24 hours) on 2018/12/21 at 8:48 UTC.
  • Health checker info. : "Short pilot jobs" has been found since 06:20:00 UTC on 2018/12/21.(details)
  • 100% Jobs fail at LCG.Pisa.it
  • Health checker info. : "Short pilot jobs" has been found since 16:20:00 UTC on 2018/12/04
  • "Failed pilot jobs" has been found at 07:20:00 UTC on 2018/11/19.
  • "Short pilot jobs" has been found at 07:20:00 UTC on 2018/11/19.
  • Health checker info. : "Short pilot jobs" has been found since 19:20:00 UTC on 2018/11/15.
  • Health checker info. : "Short pilot jobs" has been found since 16:20:00 UTC on 2018/11/14. 

  • Health checker info. : "Short pilot jobs" has been found at 14:20:00 UTC on 2018/11/11.
  • Job submission check : Pilot submission failure has been found since 10:24:00 UTC on 2018/11/11. 
  • GGUS ticket :
    1. "INFN-PISA: possible issue in CA certificate directory"(136751) has been submited at 05:26:08 UTC on 2018/08/17.
    2. "INFN-PISA: Disk space on WNs"(136750) has been submited at 04:17:17 UTC on 2018/08/17.
    3. "INFN-PISA: CVMFS availability on WNs"(136749) has been submited at 03:20:57 UTC on 2018/08/17.
    4. "INFN-PISA : Pilot failed at gridce3.pi.infn.it"(130815) has been submited at 10:11:45 UTC on 2017/09/28.
  • Health checker info. : "Short pilot jobs" has been found since 05:20:00 UTC on 2018/11/06.
  • Health checker info. : "Short pilot jobs" has been found since 03:20:00 UTC on 2018/10/31.
  • Health checker info. : "Short pilot jobs" has been found since 22:20:00 UTC on 2018/10/29.(details)
  • Health checker info. : "Short pilot jobs" has been found at 07:20:00 UTC on 2018/10/27.(details)
  • Health checker info. : "Short pilot jobs" has been found since 21:20:00 UTC on 2018/10/25.(details)
  • Health checker info. : "Short pilot jobs" has been found since 03:20:00 UTC on 2018/10/24. 
  • Health checker info. : "Short pilot jobs" has been found at 15:20:00 UTC on 2018/10/23.
  • Health checker info. : "Short pilot jobs" has been found since 19:20:00 UTC on 2018/10/22.
  • Health checker info. : "Short pilot jobs" has been found since 12:20:00 UTC on 2018/10/21.
  • Health checker info. : "Short pilot jobs" has been found since 18:20:00 UTC on 2018/10/20.(details)
  • Health checker info. : "Failed pilot jobs" has been found since 06:20:00 UTC on 2018/10/09.(details)
  • Health checker info. : "Short pilot jobs" has been found at 06:20:00 UTC on 2018/10/02.(details)
  • "Failed pilot jobs" has been found since 21:20:00 UTC on 2018/09/22.(details)
  • "Short pilot jobs" has been found since 02:20:00 UTC on 2018/09/21.(details)
  •  possible issue in CA certificate directory on WN se1wn26.pi.infn.it 
  • "Not enough disk space on <various servers> has been found since <various times> UTC on 2018/08/12 onwards.
  • Failed to install DIRAC on ...
  • "BLAH ERROR" has been found since 03:20:00 UTC on 2018/06/20. "Aborted pilot jobs" has been found since 03:20:00 UTC on 2018/06/20.(details), , the related GGUS ticket 130815--INFN-PISA : Pilot failed at gridce3.pi.infn.it

LCG.Roma3.it

  • Health checker info. : "Failed pilot jobs" has been found since 03:20:00 UTC on 2019/01/22 GGUS-139270
  • Job status check: Application finished with errors (25% of the jobs in last 24 hours) and Stalled (36%) on 2018/12/22 at 8:00 UTC.
  • Health checker info. : "Failed pilot jobs" has been found since 13:20:00 UTC on 2018/12/21.
  • Job status check: Application finished with errors (10% of the jobs in last 24 hours) and Stalled (76%) on 2018/12/21 at 8:48 UTC.
  • Health checker info. : "Failed pilot jobs" has been found at 07:20:00 UTC on 2018/12/21.(details)
  • Stalled jobs on 2018/12/20.
  • Health checker info. : "Failed pilot jobs" has been found at 06:20:00 UTC on 2018/12/08.

  • Health checker info. : "Aborted pilot jobs" has been found at 16:20:00 UTC on 2018/10/09.(details)
  • "BLAH ERROR" has been found at 04:20:00 UTC on 2018/06/20. 
  • Roma3 commissioning   (NOTE: This ticket seems obsolete, it should be closed and removed from operation status)

LCG.TAU.il

  • Health checker info. : "Short pilot jobs" has been found at 22:20:00 UTC on 2018/11/01.
  • Downtime - Start time: 2018-09-26 01:00(UTC), End time: 2018-09-28 20:00(UTC)
  • Health checker info. : "Failed pilot jobs" has been found at 18:20:00 UTC on 2018/09/14.(details)

LCG.Torino.it

  •  Health checker info. : "BLAH ERROR" has been found since 21:20:00 UTC on 2018/12/06.
  • Health checker info. : "BLAH ERROR" has been found since 03:20:00 UTC on 2018/11/23.
  • Health checker info. : "BLAH ERROR" has been found at 23:20:00 UTC on 2018/11/22.(details)
  • Health checker info. : "Failed pilot jobs" has been found since 13:20:00 UTC on 2018/11/11.
  • Health checker info. : "Failed pilot jobs" has been found at 22:20:00 UTC on 2018/11/07.
  • Job submission check : Pilot submission failure has been found at 22:22:00 UTC on 2018/11/04.
  • Health checker info. : "Failed pilot jobs" has been found since 13:20:00 UTC on 2018/11/04
  • Health checker info. : "Failed pilot jobs" has been found since 20:20:00 UTC on 2018/11/01. 
  • Health checker info. : "Failed pilot jobs" has been found since 17:20:00 UTC on 2018/10/31.
  • Health checker info. : "BLAH ERROR" has been found since 21:20:00 UTC on 2018/10/22.

LCG.ULAKBIM.tr

  • "BLAH ERROR" has been found since 20:20:00 UTC on 2018/12/24
    GGUS ticket https://ggus.eu/index.php?mode=ticket_info&ticket_id=138991 has submitted at 2018-12-25 03:06 UTC
  • Solved and verified 2018-11-02 : GGUS ticket : "TR-10-ULAKBIM : pilot job failure by CRL expiration error"(136985) has been submited at 01:35:16 UTC on 2018/09/03.
  • Health checker info. : "BLAH ERROR" has been found since 12:20:00 UTC on 2018/07/06.(details

OSG.BNL.us

  • Health checker info. : "Belle II software could not be installed on " has been found since 19:20:00 UTC on 2019/02/14.
  • Health checker info. :  "Short pilot jobs" has been found since 03:20:00 UTC on 2019/01/31
  • Health checker info. : "Aborted pilot jobs" has been found since 20:20:00 UTC on 2019/01/30.(details)
  • Health checker info. : "Aborted pilot jobs" has been found since 05:20:00 UTC on 2019/01/26.
  • Health checker info. : "Short pilot jobs" has been found since 23:20:00 UTC on 2019/01/25.
  • Health checker info. : "Short pilot jobs" has been found since 13:20:00 UTC on 2019/01/21.
  • Health checker info. : "Short pilot jobs" has been found since 05:20:00 UTC on 2019/01/19.
  • Health checker info. : "Short pilot jobs" has been found since 16:20:00 UTC on 2019/01/16.
  • Health checker info. : "Short pilot jobs" has been found since 06:20:00 UTC on 2019/01/11. 
  • Job submission check: Jobs fail with errors or input data resolution the last 24h (6:00 UTC, 2019/01/09) 
  • Health checker info. : "Short pilot jobs" has been found since 21:20:00 UTC on 2019/01/02. 
  • Health checker info. : "Short pilot jobs" has been found since 08:20:00 UTC on 2018/12/28.
  • Health checker info. : "Short pilot jobs" has been found since 00:20:00 UTC on 2018/12/27.
  • Health checker info. : "Short pilot jobs" has been found since 15:20:00 UTC on 2018/12/21.(details)
  • Health checker info. : "Short pilot jobs" has been found since 18:20:00 UTC on 2018/12/18.
  • Health checker info. : "Short pilot jobs" has been found since 07:20:00 UTC on 2018/12/14.
  • Health checker info. : "Short pilot jobs" has been found since 02:20:00 UTC on 2018/12/11.
  •   BNL network interruption 2018-Dec-18 14:00-15:00 UTC
  • Health checker info. : "Short pilot jobs" has been found since 13:20:00 UTC on 2018/07/13.(details)
    • Recurring issue 2018/11/23 through 09/30
  • Health checker info. : "Aborted pilot jobs" has been found since 09:20:00 UTC on 2018/09/19.(details)
    • Recurring issue 2018/11/04 through 09/23
  • Production jobs: UNAVAILABLE files
  • User jobs: SRM_AUTHORIZATION_FAILURE
  • Number of concurrent MCProduction jobs restricted
  •  MCProduction jobs are mostly stalled

OSG.CORI.us

  • OSG.CORI.us resource has been removed because CY18 allocation was not approved

OSG.UMiss.us

  • Health checker info. : "Aborted pilot jobs" has been found since 01:20:00 UTC on 2018/12/22.(details)
    GGUS ticket https://ggus.eu/index.php?mode=ticket_info&ticket_id=138979 has submitted at 2018-12-22 14:45 UTC.
  • Health checker info. : "Short pilot jobs" has been found since 22:20:00 UTC on 2018/12/14.
  • Health checker info. : "Short pilot jobs" has been found since 18:20:00 UTC on 2018/11/02.

SSH.KMI.jp

  • Job status check: Application finished with errors (12% of the jobs in last 24 hours) on 2018/12/22 at 11:30 UTC.
  • Health checker info. : "Short pilot jobs" has been found at 20:20:00 UTC on 2018/08/13.

VCYCLE.Napoli.it

  • Opportunistic site (Empty plot is not a problem)
  •  Ban lifted
  • "Sudo CE Error: sudo execution fails with return code 1"

VCYCLE.HNSC01.it, VCYCLE.HNSC02.it

  • Opportunistic site (Empty plot is not a problem)


Links