Contents

l




Production Plans

  • MC13 - Run-dependent MC (MC13b) should start in the coming days.

  • Proc10 - Proc10 skims will be proceed as soon as proc10 is finished, and similarly for MC13.  


Production Status

Full resource usage is expected for several weeks, at least.

Data production summary page : https://confluence.desy.de/display/BI/Data+Production+Status.

From DataProduction coordinator

Data (re)processing:

  • Proc10 (2019a/b) is nearly complete.

  • If no problems are discovered in calibration of the 2019c data set, the expectation is that processing of the 2019c data will be finished sometime in January.

MC production:

  • MC13a (run independenct) production of another 1 ab-1 is started. 

  • Run-dependent MC (MC13b) should start in the coming days.

Analysis skimming:

  • All requests for proc9 skims have been completed.

  • Proc10 skims will be proceed as soon as proc10 is finished, and similarly for MC13.  


Central Services

Dirac (dirac.cc.kek.jp, b2dchsv01-b2dchsv06.cc.kek.jp, b2dchsv08.cc.kek.jp)

  • 2019-12-25 14:30 UCT, the memory usage of b2dchsv01.cc.kek.jp has been increasing linearly for last 24 hours. 
  • 2019-12-17 DIRAC Production: b2dchsv[01-06].cc.kek.jp down
  • Date, Issue, Tickets...

DB Production (b2dchdb1.cc.kek.jp, b2dchdb2.cc.kek.jp, b2dcsdb1.cc.kek.jp, b2dcsdb2.cc.kek.jp)

  • 2019-12-06 Problem with Ganglia monitor for DB( still shows "remnant plot" after 2h check )
  • Data, Issue, Tickets...

"Web" servers

  • 2019-12-07 Ganglia Monitor for the "Web" servers still shows "remnant plots" after 2hours` check.


DDM (bldirac01.sdcc.bnl.gov)

  • 2018-03-01 DDM deletion task seems stuck

Conditions DB ()

Monitor

  • CDB access check URL should be correct
  • Issue in access to DIRAC Web Portal 

LFC

  • Date, Issue, Tickets...

File Transfers and Replication Status

  • See also Computing OperationStatus#DDM for related issues
  • No activity in file transfer both destination and source since 01/03 about 23:00 https://agira.desy.de/browse/BIIDCO-2222
  • No activity in file transfer both destination and source since 01/02 about 23:00 to 01/03 05:15 (latest checking time) 
  • There is no activity during the last three hours (01/Jan/2020 since 9:00 to 12: 15 UTC) in the "Replication Status plot"
  • There is no activity during the last three hours (01/Jan/2020 since 9:00 to 12: 15 UTC) in both plots "Throughput" and "Successful transfers"
  • There is no activity in the last two hours in both plots "Throughput" and "Successful transfers"  
  • No activity shown

FTS

  • Any problem in the FTS service or FTS monitoring are to be recorded here. Site/SE specific issues are to be recorded under each SIte/SE
  • Note that the FTS dashboard we use is an "old" instance and not well-maintained. We, Belle II members in general, do not have access to the "new" monitoring. When the dashboard is down, the shifters just need to notify the expert and skip the corresponding part of their work. The expert should check the new monitoring, for the access to the monitoring page is limited.
  • 2020-01-02 13:20 UTC File transfer failure from KMI-TMP-SE and from KEK-Disk-TMP-SE to LAL-DATA-SE
  • 2020-01-02 9:15 UTC File transfer failure from KMI-TMP-SE to LAL-DATA-SE
  • 2019-08-31  File transfer failures for past 48 hours. 
  • 2019-09-03 File transfer failures for past 24 hours. 

Replication Status

  • 2019-12-21 Decreasing Done Jobs with many Scheduled Jobs 
  • 2019-12-15 Zero Replication Efficiency 
  • 2019-1-19 almost zero done, with a increasing numbers of scheduled jobs for more than 5 SEs and more than 5 hours.
  • 2018-07-02   No Donetransfer,  several scheduled and rapid increase of Waiting replication

Job Status Plot

  • Date, Issue, Tickets...
  • All of the sites show error (red) starting at about the same time 2020-01-03 24:00 UTC. 
  • Low activity, majority of plots empty since 2020-01-02 03:07 UTC.
  • There are no jobs since 2019-10-04 00:00
  • 2019-11-27 Jobs finished with error around 7:00UTC
  • 2019-12-28 Majority of plots are empty around 11.00 UTC

Job Summary

  • Date, Issue, Tickets...
  • Number of running and waiting jobs have decreased by more than 20% since approx. 2019-12-31 19:00 UTC. 
  • Following JIRA ticket updated :



SEs

SE Common Issues

  • Issues with individual SEs should be recorded below (Primary SEs or Other SEs)

Raw data SEsLink to JIRA ticket: 

Raw data SE: KEK-RAW-SE (srm://kek2-se02.cc.kek.jp:8444/srm/managerv2?SFN=/belle/RAW)

  • 2019-07-24 15:30 UTC: all transfers failed (0/72) between KEK-RAW-SE and BNL-TMP-SE

Raw data SE: BNL-TAPE-SE (srm://dcblsrm.sdcc.bnl.gov:8443/srm/managerv2?SFN=/pnfs/sdcc.bnl.gov/tape)

  • date, issue, tickets

Primary SEs

Primary SE: BNL-TMP-SE (dcblsrm.sdcc.bnl.gov)

  • No Replication Trend Plot for BNL-TMP-SE 2020-01-02 09:30 UTC 
  • File Transfer Failures have been observed since 2019-12-23 21:30:00 UTC, approximately. 
  • SE Health Check by DDM: Failure on upload have been observed since 2019-12-23 03:38:01 (3 hours)
  • SE Health Check by DDM: Failure on upload have been observed since 2019-12-22 02:22:57 (4 hours)
  • SE Health Check by DDM: Failure on upload have been observed since 2019-12-21 14:29:51 (4 hours)
  • SE Health Check by DDM: Failure on upload have been observed since 2019-12-21 01:02:50 (5 hours)
  • SE Health Check by DDM: Failure on upload have been observed since 2019-12-20 00:03:32 (5 hours)

     Solved and verified :  GGUS ticket https://ggus.eu/index.php?mode=ticket_verify&ticket_id=144665
  • SE Health check by DDM : download does not work since 2019-05-16 07:11:21 UTC.

  •  UNAVAILABLE files
  • SE Health check by DDM : download does not work since 2019-05-15 21:03:25 UTC

Primary SE: CESNET-TMP-SE (dpm1.egee.cesnet.cz) 

  • Replication status. Plot is not shown 2020-01-15 13:01 UTC
  • Replication status. Plot is not shown 2020-01-02 8:00 UTC
  • Replication status. Plot is not shown 2019-12-31 9:25 UTC
  • Replication status. Plot is not shown 2019-12-31 7:45 UTC
  • Plot is not shown 2019-12-25 05:00 UTC.
  • SE Health check by DDM : remove file, remove directory, ls do not work since 2019-07-10 06:32:47 UTC.

Primary SE: CNAF-TMP-SE (storm-fe-archive.cr.cnaf.infn.it)

  • File transfer failure for source have been observed since 2019-12-23 02:00 UTC
  • SE Health check by DDM : remove file does not work since 2019-05-13 08:27:21 UTC.
  • SE Health check by DDM : remove file, remove directory, download, upload, ls do not work since 2019-04-25 23:13:00 UTC.
  • 2019/01/27 File transfer failures from CNAF-TMP-SE to NTUCC-DATA-SE.
  •  Cotinuous timeout failure between NTU-CC-TMP-SE and CNAF-TMP-SE

Primary SE: DESY-TMP-SE (dcache-se-desy.desy.de)

  • Date, Issue, Tickets...

Primary SE: KEK-DISK-TMP-SE (srm://kek2-se03.cc.kek.jp:8444/srm/managerv2?SFN=/disk/belle/TMP)

  • 2020-01-02 13:20 UTC File transfer failure from KMI-TMP-SE and from KEK-Disk-TMP-SE to LAL-DATA-SE
  • Replication status: No. of waiting jobs increasing, and File Transfer Failures have been observed since 2019-12-23 21:30:00 UTC, approximately. 

Primary SE: KEK2-TMP-SE (srm://kek2-se03.cc.kek.jp:8444/srm/managerv2?SFN=/belle/TMP)

  • SE Health Check by DDM: Failure on ls, upload have been observed since 2019-11-10 07:23:23 (5 hours)
  • Following JIRA tickets submitted: BIIDCO-1866
  • Number of jobs with status "done" is zero. 2019-07-05 7:07

Primary SE: KISTI-TMP-SE (belle-se-head.sdfarm.kr)

  • No new assignment of MC production data blocks to this destination

  • SE Health check by DDM : download, upload do not work since 2019-09-22 23:38:00 UTC.

Primary SE: KIT-TMP-SE (dcachesrm-kit.gridka.de)

  • Date, Issue, Tickets...

Primary SE: KMI-TMP-SE (nsrmfe01.hepl.phys.nagoya-u.ac.jp )

  • 2020-01-02 09:15 UTC File transfer failure from KMI-TMP-SE to LAL-DATA-SE
  • Replication status: No. of waiting jobs increasing, and File Transfer Failures have been observed since 2019-12-23 21:30:00, approximately. 
  • KMI-TMP-SE with Scheduled jobs overwhelming Done ones since 9:00 UTC

Primary SE: Napoli-TMP-SE (belle-dpm-01.na.infn.it )

  • Date, Issue, Tickets...
  • No Replication trend Plot shown 2020-01-02  08:00 UTC 
  • No replication trend plot shown since 2020-01-01 23:05 UTC.

Primary SE: SIGNET-TMP-SE (dcache.ijs.si )

  • File transfer failure for destination have been observed since 2019-12-23 02:00 UTC
  • File transfer failure as destination have been observed since 2019-11-30 1:00 UTC
    Solved and verified GGUS ticket https://ggus.eu/index.php?mode=ticket_info&ticket_id=144310 is submitted
  • SE Health Check by DDM: Failure on upload have been observed since 2019-11-30 05:49:18 (6 hours)

Other SEs

Adelaide-TMP-SE (coepp-dpm-01.ersa.edu.au)

CYFRONET-TMP-SE (dpm.cyf-kr.edu.pl)

  • Date, Issue, Tickets...

CINVESTAV-TMP-SE (jaguar-se.fis.cinvestav.mx)

  • Date, Issue, Tickets...

Frascati-TMP-SE (atlasse.lnf.infn.it)

  • Date, Issue, Tickets...

HEPHY-TMP-SE (hephyse.oeaw.ac.at)

  • Date, Issue, Tickets...

IPHC-TMP-SE (sbgse1.in2p3.fr)

  • Date, Issue, Tickets...

LAL-TMP-SE (grid05.lal.in2p3.fr)

  • Date, Issue, Tickets...

Melbourne-TMP-SE (b2se.mel.coepp.org.au)

  • transfer rate to be zero

  • Melbourne-DATA-SE banned for write

McGill-TMP-SE  (storm02.clumeq.mcgill.ca)

  • McGill-TMP-SE will be decomissioned in early 2018.

MPPMU-TMP-SE (grid-srm.rzg.mpg.de)


NTU-TMP-SE (bgrid3.phys.ntu.edu.tw)

  •  
  •  NTU-TMP-SE banned for write 
  • 2019-08-31  File transfer failures for past 48 hours. 

NTU-CC-TMP-SE (belle2grid3.cc.ntu.edu.tw)

  • 202/01/13, 2019/12/17, 2019/12/11 File transfer failures -
  • 2019-10-06 File transfer failures for past 24 hours. 
  • 2019/8/23 file transfer failure to NTU-CC-DATA-SE
  • FTS transfer failure as SOURCE NTU-CC-DATA-SE to BNL-TMP-SE
    GGUS ticket https://ggus.eu/index.php?mode=ticket_info&ticket_id=142550 has submitted
  • 2019/01/27 File transfer failures from CNAF-TMP-SE to NTUCC-DATA-SE. 
  • File transfer failure and cancellation to NTUCC-DATA-SE happened 2018-12-22
  • Frequent timtout has observed between NTU-CC-TMP-SE and CNAF-TMP-SE
    GGUS ticket https://ggus.eu/index.php?mode=ticket_info&ticket_id=137334 has submitted 2018-09-22 05:10 UTC
  • NTUCC-TMP-SE banned for write 

Pisa-TMP-SE (stormfe1.pi.infn.it)

  • Date, Issue, Tickets...

PNNL-TMP-SE (se.hep.pnnl.gov) 

  • Being decommissioned. No need to report any issues. 

Roma3-TMP-SE (storm-01.roma3.infn.it)

  •  Date, Issue, Tickets...

TAU-TMP-SE (tau-se.hep.tau.ac.il)

  • Date, Issue, Tickets...

Torino-TMP-SE (se-srm-00.to.infn.it)

  • Date, Issue, Tickets...

ULAKBIM-TMP-SE (torik1.ulakbim.gov.tr)

  • File transfer failures destination 

UMiss-TMP-SE (umiss005.hep.olemiss.edu)

  • Date, Issue, Tickets...

UVic-TMP-SE(charon01.westgrid.ca)

  • File Transfer Failures have been observed since 2019-12-23 21:30:00 UTC, approximately. 
  • File Transfer failures : File Transfer Efficiency is too low from UVic-DATA-SE. since about 2018-12-18 1:00 (UTC) 



Sites

Sites Common Issue

  • Date, issue for sites wide
  • Job status plots: jobs are failing at 8 cites at the same time (circa 2019-11-28) 16.00-18.00 UTC : ARC.KIT.deDIRAC.UVic-local.caLCG.KEK.jpLCG.NTU.twLCG.Napoli.itTest.KIT.deOSG.BNL.us, LCG.CNAF.it  

ARC.DESY.de

  • Date, issue, JIRA

ARC.DESY-test.de

  • A test queue for the new CE.

ARC.KIT.de

  • Downtime on 2020-01-16 09:00 to 2020-01-16 09:30
  • Downtime on 2020-01-15 13:00 to 2020-01-15 15:00 
  • Downtime on 2019-01-13  8:00 UTC  to 2020-01-13 16:00 UTC
  • "Short Pilot" has been observed since 2020-01-05 00:34 UTC (for 7 hours)
  • "Failed Payload Job" has been observed since 2020-01-05 00:34 UTC (for 7 hours)
  • Following JIRA tickets submitted: BIIDCO-2140
  • "Short Pilot" has been observed since 2020-01-04 02:34 UTC (for 4 hours)
  • Following JIRA tickets submitted: BIIDCO-2140
  • Downtime 2019-12-11 13:00 UTC - 2020-01-24 22:00 (UTC) 
  • Pilot Submission Failure" has been observed since 2019-12-03 21:35 UTC
    Solved and verified GGUS ticket https://ggus.eu/index.php?mode=ticket_info&ticket_id=144364 has been submitted

  • "Short Pilot" has been observed since 2019-11-21 21:35 UTC (for 10 hours)
  • "Failed Payload Job" has been observed since 2019-11-22 01:35 UTC (for 6 hours)
  • "Short Pilot" has been observed since 2019-11-20 13:35 UTC (for 1 hours)

  • "Short Pilot" has been observed since 2019-11-18 13:35 UTC
  • "Pilot Submission Failure" has been observed since 2019-11-13 13:30 UTC (for 1 hours)
  • "Short Pilot" has been observed since 2019-11-12 13:30 UTC (for 1 hours)
  • "Short Pilot" has been observed since 2019-11-11 09:30 UTC (for 5 hours)  
  • "Short Pilot" has been observed since 2019-11-10 03:30 UTC (for 3 hours)
  • Downtime 2020-01-09 14:38 - 2020-01-10 16:00 UTC

ARC.LMU.de

  • This is a test site. Do not need to report any issue.

ARC.LMU2.de

  • Banned as currently no resource behind the CE

ARC.Melbourne.au

  • "Failed Payload Job" has been observed since 2020-01-05 05:34 UTC (for 2 hours)
  • Following JIRA tickets submitted: BIIDCO-2158
  •  Downtime 2019-12-02 22:00- 2019-12-03 03:00 (UTC)  
  • Job status check: Application finished with errors observed on 2019-11-20.

ARC.MPPMU.de

  • "Failed Payload Job" has been observed since 2020-01-05 02:34 UTC (for 5 hours)
  • Following JIRA tickets submitted: BIIDCO-2052 , BIIDCO-1386 , BIIDCO-128
  •  
  • "Failed Payload Job" has been observed since 2019-09-30 14:27 UTC (for 8 hours)
  • "Failed Payload Job" has been observed since 2019-09-29 15:27 UTC (for 7 hours)
  • Job submission check : Pilot submission failure has been found since 00:26:00 UTC on 2019/04/21.

ARC.SIGNET.si

  • Aborted pilots on CE pikolit.ijs.si since 2020-01-05 01:34 UTC: BIIDCO-2224
  • "Aborted Pilot" has been observed since 2020-01-05 01:34 UTC (for 6 hours)
  • Following JIRA tickets submitted: BIIDCO-2172 , BIIDCO-1383 , BIIDCO-2112 , BIIDCO-1350 , BIIDCO-1420
  • "Short Pilot" has been observed since 2020-01-04 02:34 UTC (for 4 hours)
  • "Failed Payload Job" has been observed since 2020-01-04 02:34 UTC (for 4 hours)
  • Following JIRA tickets submitted: BIIDCO-2172 , BIIDCO-1383 , BIIDCO-2112 , BIIDCO-1350 , BIIDCO-1420
  • "Short Pilot" has been observed since 2019-12-24 18:28 UTC (for 4 hours)
  • Increasing rate of stalled job, no jobs completed anymore since 10:00 UTC
  •  "Aborted Pilot" has been observed since 2019-12-03 21:35 UTC 
    Solved and verified  GGUS ticket https://ggus.eu/index.php?mode=ticket_info&ticket_id=144366 has been submitted
  • "Short Pilot" has been observed since 2019-11-11 09:30 UTC (for 5 hours)  
  • "Short Pilot" has been observed since 2019-11-10 05:30 UTC (for 1 hours)
  • Health checker info. : "Not enough disk space on " has been found since 21:20:00 UTC on 2019/09/22. (details)
  • Health checker info. : "Failed pilot jobs" has been found since 20:20:00 UTC on 2019/08/28.(details)
  • Health checker info. : "Short pilot jobs" has been found since 13:20:00 UTC on 2019/08/01.
  • Health checker info. : "Aborted pilot jobs" has been found since 01:20:00 UTC on 2019/06/03
  • "Short pilot jobs" has been found since 10:20:00 UTC on 2019/05/27.(details)
  • "Failed pilot jobs" has been found at 15:20:00 UTC on 2019/05/22.(details)
  • "Short pilot jobs" has been found at 15:20:00 UTC on 2019/05/22.(details)
  • Health checker info. : "Short pilot jobs" has been found at 14:20:00 UTC on 2019/05/21.(details)
  • Job status check: many Stalled jobs on 2019/05/14 at 7:00 UTC.
  • Health checker info. : "Short pilot jobs" has been found at 07:20:00 UTC on 2019/05/14.(details)
  • Health checker info. : "Short pilot jobs" has been found at 15:20:00 UTC 2019/04/05 and at 14:20:00 UTC on 2019/04/12.
  • Job status check: Application finished with errors (5% of the jobs) at 11:15 UTC on 2018/12/21.

  • "Failed to install DIRAC on " has been found since 20:20:00 UTC on 2018/11/03.

  • Health checker info. : "Aborted pilot jobs" has been found since 20:20:00 UTC on 2018/10/20.
  • Health checker info. : "Failed pilot jobs" has been found since 06:20:00 UTC on 2018/10/03.(details)

CLOUD.CC1_Krakow.pl

  • Not used in production yet. Seeing no jobs (no plot) is not a problem

DIRAC.Beihang.cn

  • Site is banned.
  • "Pilot Submission Failure" has been observed since 2020-01-05 16:34 UTC (for 29 hours)
  • Following JIRA tickets submitted: BIIDCO-1312 , BIIDCO-2202 , BIIDCO-1807 , BIIDCO-1812 , BIIDCO-1534 , BIIDCO-289 , BIIDCO-43 , BIIDCO-38
  • "Pilot Submission Failure" has been observed since 2019-12-30 19:34 UTC (for 59 hours)
  • "Pilot Submission Failure" has been observed since 2019-12-30 19:34 UTC (for 11 hours)
  • "Pilot Submission Failure" has been observed since 2019-12-23 05:28 UTC (for 7 hours). 
  • "Pilot Submission Failure" has been observed since 2019-12-23 05:28 UTC (for 1 hours)
  • Job submission check : Pilot submission failure has been found since 04:21:00 UTC on 2019/07/04.
  • Health checker info. : "Short pilot jobs" has been found since 16:20:00 UTC on 2019/06/30.
  • "Failed Payload Job" has been observed since 2019-04-19 11:15 UTC  
  • Health checker info. : "Short pilot jobs" has been found at 06:20:00 UTC on 2019/04/18.
  • Health checker info. : "Short pilot jobs" has been found since 06:20:00 UTC on 2019/04/17.
  • "Application finished with errors" (100% currently) on 2019/04/10 00:15 UTC. Problem reported since (at least) 2019/04/07 07:00 UTC.
  • Health checker info. : "Short pilot jobs" has been found at 06:20:00 UTC on 2018/12/08.
  • Job status check: "application finished with errors" (100% currently) on 2018/10/26.
  • Job submission check : Pilot submission failure has been found since 09:24:00 UTC on 2018/09/21. (details)
  • The number of jobs limited.
  • All the upload trials are failing against all the SEs configured: OutputSE (KMI-TMP-SE, PNNL-TMP-SE), Fail-over SEs(DESY-TMP-SE, Napoli-TMP-SE, PNNL-TMP-SE, KIT-TMP-SE) 
  • Large % of failed jobs in DIRAC status plot (Added 2016-11-03 22:45:00 UTC) 

DIRAC.BINP.ru

  • "Failed Payload Job" has been observed since 2020-01-04 20:34 UTC (for 10 hours)
  • "Short Pilot" has been observed since 2019-11-10 05:30 UTC (for 1 hours)
  • Job status check: "Application Finished With Errors" (39% of the jobs over the last 24h) at 7:00 UTC on 2019/05/15.
  • Job status check: Application finished with errors (27% of the jobs over the last 24h) at 8:00 UTC on 2018/12/22.
  • Health checker info. : "Failed to install DIRAC on " has been found at 22:20:00 UTC on 2018/09/15

DIRAC.BINP-VM.ru

  • "Pilot Submission Failure" has been observed since 2020-01-17 05:53 UTC (for 17 hours)
  • "Failed Payload Job" has been observed since 2020-01-05 05:34 UTC (for 1 hours)
  • Following JIRA tickets submitted: BIIDCO-2208 , BIIDCO-1607 , BIIDCO-749 , BIIDCO-2185
  • Job status check: Job stalled 100%, then no jobs running since 2019-12-24 06:00:00 (for 24 hours) 
  • Health checker info. : "Aborted pilot jobs" has been found at 06:20:00 UTC on 2019/02/21
  • Job submission check : Pilot submission failure has been found since 10:23:00 UTC on 2019/01/14.
  • Job status plots, "Application Finished With Errors" (2018-02-11 but lasting for at least a month)
  • Downtime: maintenance on dirac.binp-vm.ru till 2019-12-21.

DIRAC.CINVESTAV.mx

  • Health checker info. : "Short pilot jobs" has been found at 14:20:00 UTC on 2019/04/14.

  • Job submission check : Pilot submission failure has been found at 13:27:00 UTC on 2019/03/19.

DIRAC.DESY.de

  • Test site. Not in use in MC production

DIRAC.IITG.in    

  • "Failed Payload Job" has been observed since 2019-12-19 21:28 UTC (for 1 hours)
  • "Aborted Pilot" has been observed since 2019-11-29 04:35 UTC (for 265 hours)
  • "Aborted Pilot" has been observed since 2019-11-21 10:35 UTC (for 22 hours)
  • Following JIRA tickets submitted: BIIDCO-2070 , BIIDCO-1686 , BIIDCO-1823
  • "Aborted Pilot" has been observed since 2019-11-19 15:35 UTC (for 39 hours) (details).
  • "Aborted Pilot" has been observed since 2019-11-18 06:35 UTC (for 1 hours) (details).
  • "Aborted Pilot" has been observed since 2019-11-02 20:30 UTC: JIRA ticket updated
  • AID: "Aborted Pilot" has been observed since 2019-10-16 22:38 UTC: JIRA ticket created

  • "Aborted pilot jobs" has been found since 12:20:00 UTC on 2019/07/10. (screenshot)
  • Job status check: "Application finished with errors" on 2019/07/10 at 00:00 UTC (screenshot)
  • Health checker info. : "Short pilot jobs" has been found at 06:20:00 UTC on 2019/05/16.
  • Job status check: many "Application finished with errors" (overall 66% during past 24 hours) on 2019/05/15 at 7:00 UTC.
  • Job status check: many "Application finished with errors" on 2019/05/14 at 7:00 UTC.
  • Job status plots, 100% "Application Finished With Errors", 10:00:00 UTC on 2019/04/08. Still unchanged as of 2019/04/26. 
  • Health checker info. : "Aborted pilot jobs" has been found at 00:20:00 UTC on 2019/04/08.
  • Job status check: Application finished with errors (95% of the jobs over the last 24h) at 8:00 UTC on 2018/12/22.
  • Job status check: Input Data Resolution issues (100% of the jobs) on 2018/12/21 at 8:48 UTC.

DIRAC.IITH.in

  • "Aborted pilot jobs" has been found at 22:20:00 UTC on 2019/06/03.(details)
  • Health checker info. : "Aborted pilot jobs" has been found at 22:20:00 UTC on 2019/06/02.(details)
  • Health checker info. : "Short pilot jobs" has been found at 07:20:00 UTC on 2019/03/29.(details

DIRAC.LMU.de

  • Not in use in MC production
  • Banned for now.

DIRAC.MIPT.ru

  • "Short Pilot" has been observed since 2020-01-16 21:53 UTC (for 1 hours)
  • "Failed Payload Job" has been observed since 2020-01-16 21:53 UTC (for 1 hours)
  • "Failed Payload Job" has been observed since 2020-01-15 02:53 UTC (for 14 hours)
  • "Short Pilot" has been observed since 2020-01-15 05:53 UTC (for 1 hours) (details).


  • "Failed Payload Job" has been observed since 2020-01-15 02:53 UTC (for 4 hours) (details)

  • "Short Pilot" has been observed since 2020-01-14 00:53 UTC (for 6 hours)
  • "Short Pilot" has been observed since 2020-01-04 07:34 UTC (for 62 hours)
  • "Failed Payload Job" has been observed since 2020-01-04 07:34 UTC (for 62 hours)
  • Following JIRA tickets submitted: BIIDCO-1816 , BIIDCO-2112 , BIIDCO-1768
  • "Aborted Pilot" has been observed since 2019-12-25 03:28 UTC (for 3 hours)
  • "Aborted Pilot" has been observed since 2019-12-23 21:28 UTC (for 9 hours)
  • "Aborted Pilot" has been observed since 2019-12-23 21:28 UTC (for 1 hours)
  • "Aborted Pilot" has been observed since 2019-12-21 13:28 UTC (for 17 hours)

  • "Aborted Pilot" has been observed since 2019-12-21 03:28 UTC (for 3 hours) 
  • "Aborted Pilot" has been observed since 2019-12-20 10:28 UTC (for 5 hours) 
  • "Short Pilot" has been observed since 2019-11-11 11:30 UTC (for 3 hours)  
  • "Short Pilot" has been observed since 2019-11-10 04:30 UTC (for 2 hours)
  • "Failed Payload Job" has been observed since 2019-11-10 05:30 UTC (for 1 hours)
  • Health checker info. : "Short pilot jobs" has been found since 12:20:00 UTC on 2019/05/25.(details)
  • Health checker info. : "Aborted pilot jobs" has been found since 13:20:00 UTC on 2019/04/20.
  • Health checker info. : "Aborted pilot jobs" has been found since 11:20:00 UTC on 2019/04/06 and since 05:20:00 UTC on 2019/04/12. and since 20:20:00 UTC on 2019/04/17.
  • Health checker info. : Short pilot jobs" has been found at 23:20:00 UTC on 2019/04/10 and 15:20:00 UTC on 2019/04/14.
  • Health checker info. : "Short pilot jobs" has been found at 07:20:00 UTC on 2019/03/29.(details

DIRAC.Nagoya.jp

  • "Failed Payload Job" has been observed since 2020-01-22 12:53 UTC (for 10 hours)
  • "Short Pilot" has been observed since 2020-01-04 21:34 UTC (for 9 hours)
  • "Failed Payload Job" has been observed since 2020-01-04 23:34 UTC (for 7 hours)
  • Following JIRA tickets submitted: BIIDCO-2178 , BIIDCO-2112
  • Job stalling since 2019-12-14 10:00(UTC) 
  • "Short Pilot" has been observed since 2019-11-19 05:35 UTC (for 1 hours)
  • "Short Pilot" has been observed since 2019-11-11 11:30 UTC (for 3 hours) 
  • "Short Pilot" has been observed since 2019-11-10 05:30 UTC (for 1 hours)
  • Health checker info. : "Short pilot jobs" has been found since 09:20:00 UTC on 2019/10/09.

DIRAC.Nara-WU.jp

  • Job submission check : Pilot submission failure has been found at 06:23:00 UTC at 2019/10/03.
  • Under commissioning from 2018-11-13

DIRAC.NDU.jp

  • "Failed Payload Job" has been observed since 2020-01-04 20:34 UTC (for 10 hours)
  • Following JIRA tickets submitted: BIIDCO-2112
  • "Short Pilot" has been observed since 2020-01-04 02:34 UTC (for 4 hours)
  • Following JIRA tickets submitted: BIIDCO-2112
  • "Short Pilot" has been observed since 2019-11-11 10:30 UTC (for 4 hours) 

DIRAC.Niigata.jp

  • "Failed Payload Job" has been observed since 2020-01-05 05:34 UTC (for 1 hours)
  • "Short Pilot" has been observed since 2019-11-19 05:35 UTC (for 1 hours)

  • Health checker info. : "Short pilot jobs" has been found at 15:20:00 UTC on 2019/10/09.

  • Job submission check : Pilot submission failure has been found since 19:26:00 UTC on 2019/05/26. (details)

  • Health checker info. : "Aborted pilot jobs" has been found since 12:20:00 UTC on 2019/05/18.
  • Job submission check : Pilot submission failure has been found since 13:30:00 UTC on 2019/05/14. (details)
  • Health checker info. : "Aborted pilot jobs" has been found at 06:20:00 UTC on 2019/04/21.

DIRAC.Niigata2.jp       

  • Date, Issue, Tickets..

DIRAC.Osaka-CU.jp

  • Site is banned
  • "Pilot Submission Failure" has been observed since 2020-01-03 09:34 UTC (for 84 hours)
  • Following JIRA tickets submitted: BIIDCO-1434 , BIIDCO-2205
  • "Pilot Submission Failure" has been observed since 2020-01-03 09:34 UTC (for 21 hours)
  • Following JIRA tickets submitted: BIIDCO-1434 , BIIDCO-2205
  • "Pilot Submission Failure" has been observed since 2020-01-01 00:34 UTC (for 30 hours)
  • "Pilot Submission Failure" has been observed since 2020-01-01 00:34 UTC (for 6 hours)
  • "Pilot Submission Failure" has been observed since 2019-12-31 20:34 UTC (for 2 hours)
  • "Pilot Submission Failure" has been observed since 2019-12-30 16:34 UTC (for 14 hours)  
  • "Pilot Submission Failure" has been observed since 2019-12-25 04:28 UTC (for 2 hours)
  • "Pilot Submission Failure" has been observed since 2019-12-24 04:28 UTC (for 9 hours) 
  • "Pilot Submission Failure" has been observed since 2019-12-24 04:28 UTC (for 2 hours)
  • Job submission check : Pilot submission failure has been found since 06:21:00 UTC on 2019/04/02
  • Job submission check : Pilot submission failure has been found since 07:23:00 UTC on 2018/12/04. (details)
  • Health checker info. : "Short pilot jobs" has been found since 22:20:00 UTC on 2018/03/17.
    → Ask site admin to check the status 2018-03-17 10:00 JST. (DB access failure again from DIRAC.Osaka-CU.jp to PNNL from 2018-03-16 11:00 UTC)

DIRAC.PAU.in

  • "Pilot Submission Failure" has been observed since 2020-01-22 20:53 UTC (for 2 hours)

DIRAC.PNNL.us

  • Site to be decommissioned

DIRAC.PNNL2.us

  • Site to be decommissioned

DIRAC.PNNL-CASCADE.us

  • Seeing no jobs (no plot) is not a problem

DIRAC.PNNL-PIC.us

  • Seeing no jobs (no plot) is not a problem

DIRAC.RCNP.jp

  • "Failed Payload Job" has been observed since 2020-01-05 02:34 UTC (for 4 hours)
  • "Short Pilot" has been observed since 2020-01-04 02:34 UTC (for 4 hours)

DIRAC.LocalTest.jp

  • Downtime:
  • "Short Pilot" has been observed since 2019-11-11 10:30 UTC (for 4 hours). 
  • Health checker info. : "Short pilot jobs" has been found since 09:20:00 UTC on 2019/10/09

DIRAC.SSU.kr

  • "Failed Payload Job" has been observed since 2020-01-05 05:34 UTC (for 1 hours)
  • Following JIRA tickets submitted: BIIDCO-2112 , BIIDCO-2106 , BIIDCO-1982
  • "Short Pilot" has been observed since 2019-11-11 10:30 UTC (for 4 hours) 
  • Health checker info. : "Short pilot jobs" has been found at 13:20:00 UTC on 2019/10/03.(details
  • Health checker info. : "Short pilot jobs" has been found since 23:20:00 UTC on 2019/08/27
  • "Short pilot jobs" has been found at 01:20:00 UTC on 2019/08/26. 
  • Health checker info. : "Short pilot jobs" has been found at 04:20:00 UTC on 2019/08/21.(details)
  • Health checker info. : "Short pilot jobs" has been found at 23:20:00 UTC on 2019/08/21.(details)

DIRAC.TIFR.in

  • "Short Pilot" has been observed since 2020-01-12 04:53 UTC (for 74 hours)
  • "Failed Payload Job" has been observed since 2020-01-14 09:53 UTC (for 21 hours)
  • "Failed Payload Job" has been observed since 2020-01-12 05:53 UTC (for 11 hours)
  • "Short Pilot" has been observed since 2020-01-12 04:53 UTC (for 11 hours)
  • "Short Pilot" has been observed since 2019-11-11 10:30 UTC (for 4 hours) 
  • "Short Pilot" has been observed since 2019-11-10 04:30 UTC (for 2 hours)
  • Following JIRA tickets submitted: BIIDCO-1132 , BIIDCO-714
  • Health checker info. : "Short pilot jobs" has been found at 14:20:00 UTC on 2019/10/06.
  • Job submission check : Pilot submission failure has been found at 14:25:00 UTC on 2019/05/11. (details)
  • Health checker info. : "Short pilot jobs" has been found since 14:20:00 UTC on 2019/05/10.(details)
  • Health checker info. : "Short pilot jobs" has been found since 13:20:00 UTC on 2018/10/22.
  • Job status plots, "Application Finished With Errors" has been found at about 00:00:00 JST on 2018/07/06. (details)
  • Health checker info. : "Short pilot jobs" -- Already reported: 
  •  RunningLimit is set for MCProduction=1
  • Job stalled at input data resolution

DIRAC.TMU.jp

  • "Failed Payload Job" has been observed since 2020-01-05 05:34 UTC (for 1 hours)
  • Following JIRA tickets submitted: BIIDCO-2112 , BIIDCO-1522
  • "Short Pilot" has been observed since 2019-11-11 11:30 UTC (for 3 hours) 
  • "Pilot Submission Failure" has been observed since 2019-10-27 12:30 UTC (for 2 hours)

  • Health checker info. : "Failed to install DIRAC on " has been found since 14:20:00 UTC on 2019/04/24.
  • Job submission check : Pilot submission failure has been found at 13:27:00 UTC on 2019/03/19.
  • Health checker info. : "Short pilot jobs" has been found since 10:20:00 UTC on 2018/11/02

DIRAC.Tokyo.jp

  • Decommissioned
  • Date, Issue, Tickets..

DIRAC.UAS.mx

  • "Short Pilot" has been observed since 2019-11-11 11:30 UTC (for 3 hours)  
  • "Short Pilot" has been observed since 2019-11-10 05:30 UTC (for 1 hours)
  • Following JIRA tickets submitted: BIIDCO-1772 , BIIDCO-1508
  • Health checker info. : "Belle II software could not be installed on " has been found since 15:20:00 UTC on 2019/04/25
  • Job submission check : Pilot submission failure has been found since 00:21:00 UTC on 2019/04/04. (details)
  • Health checker info. : "Belle II software could not be installed on " has been found since 01:20:00 UTC on 2019/02/20.
  • Job submission check: 100% failed with errors from 22:00 2019/01/08 till 04:00 2019/01/09 (UTC)
  • Health checker info. : "Belle II software could not be installed on " has been found since 04:20:00 UTC on 2018/12/17. 
  • Health checker info. : "Belle II software could not be installed on " has been found since 16:20:00 UTC on 2018/11/14.
  • Job submission check : Pilot submission failure has been found since 01:26:00 UTC on 2018/09/21. (details)

DIRAC.UVic.ca

  • Downtime 
  • "Failed Payload Job" has been observed since 2020-01-05 00:34 UTC (for 6 hours)
  • Following JIRA tickets submitted: BIIDCO-2188 , BIIDCO-2137 , BIIDCO-2124
  • Downtime

  • 2019-11-27 Downtime 2019-11-27 17:00-18:00 UTC
  • Many stalled jobs 

DIRAC.UVic-local.ca

  • Downtime 
  • "Failed Payload Job" has been observed since 2020-01-05 00:34 UTC (for 6 hours)
  • Following JIRA tickets submitted: BIIDCO-2140
  • Downtime
  • Downtime
  •  
    GGUS ticket https://ggus.eu/index.php?mode=ticket_info&ticket_id=144312 has been submitted
  • Health checker info. : "Short pilot jobs" has been found at 14:20:00 UTC on 2019/10/06.
  • User jobs failed on the site:
  • Job status check: "Input Data Resolution" issues (13% overall, 100% in past hours) on 2019/05/16 at 7:00 UTC.
  • Health checker info. : "Short pilot jobs" has been found since 04:20:00 UTC on 2019/05/16.(details)
  • Health checker info. : "Belle II software could not be installed on " has been found since 04:20:00 UTC on 2019/05/13.

DIRAC.Yamagata.jp

  • "Short Pilot" has been observed since 2019-11-19 13:35 UTC (for 1 hours)

  • "Short Pilot" has been observed since 2019-11-18 11:35 UTC
  • "Short Pilot" has been observed since 2019-11-18 06:35 UTC (for 1 hours) (details).

  • Health checker info. : "Short pilot jobs" has been found since 21:20:00 UTC on 2019/06/05.(details)

  • Health checker info. : "Short pilot jobs" has been found at 22:20:00 UTC on 2019/03/13.(details)

DIRAC.Yonsei.kr

  • "Short Pilot" has been observed since 2020-01-05 05:34 UTC (for 1 hours)
  • "Failed Payload Job" has been observed since 2020-01-05 05:34 UTC (for 1 hours)
  • Following JIRA tickets submitted: BIIDCO-2112
  • "Failed Payload Job" has been observed since 2019-12-30 21:34 UTC (for 1 hours)
  • "Short Pilot" has been observed since 2019-11-11 11:30 UTC (for 3 hours)  
  • Date, Issue, Tickets..

DIRAC.LocalTest.jp

  • Date, Issue, Tickets..

LCG.CESNET.cz

  • "Failed Payload Job" has been observed since 2020-01-04 21:34 UTC (for 10 hours)
  • Following JIRA tickets submitted: BIIDCO-2112 , BIIDCO-771
  • "Failed Pilot" has been observed since 2019-12-24 16:28 UTC (for 6 hours)
  • No jobs running since 2019-12-23 23:00:00 UTC, approximately.
  • "Short Pilot" has been observed since 2019-11-21 21:35 UTC (for 11 hours)
  • "Short Pilot" has been observed since 2019-11-20 18:35 UTC (for 12 hours) (details). 
  • Health checker info. : "Failed pilot jobs" has been found at 06:20:00 UTC on 2019/05/15.(details)
  • Job submission check : Pilot submission failure has been found at 06:26:00 UTC on 2019/05/15. (details)
  • Health checker info. : "Failed pilot jobs" has been found since 20:20:00 UTC on 2019/05/13.(details)
  •   Need some intervention to run Merge jobs

LCG.COSENZA.IT

  • "Failed Payload Job" has been observed since 2020-01-05 04:34 UTC (for 3 hours)
  • "Short Pilot" has been observed since 2019-11-22 04:35 UTC (for 3 hours)
  • "Failed Payload Job" has been observed since 2019-11-21 21:35 UTC (for 10 hours)
  • Downtime 2019-11-12 13:00 - 2019-11-19 19:00 (UTC)

  • "Short Pilot" has been observed since 2019-11-11 10:30 UTC (for 4 hours)

  • Health checker info. : "Short pilot jobs" has been found at 22:20:00 UTC on 2019/09/24.

LCG.CNAF.it

  • Downtime 2020-01-22 23:00 to 2020-01-23 19:00 UTC
  • "Failed Payload Job" has been observed since 2020-01-04 21:34 UTC (for 10 hours)
  • Following JIRA tickets submitted: BIIDCO-2140
  • Downtime 2019-12-09 08:00 to 2019-12-09 10:30 UTC
  • 2019-11-27 Downtime 2019-11-27 9:00-16:00 UTC 
  • "Short Pilot" has been observed since 2019-11-11 09:30 UTC (for 5 hours)  
  • "Short Pilot" has been observed since 2019-11-10 05:30 UTC (for 1 hours)
  • Health checker info. : "Short pilot jobs" has been found since 11:20:00 UTC on 2019/10/09.

  • Health checker info. : "Short pilot jobs" has been found at 14:20:00 UTC on 2019/10/07.
  • Health checker info. : "Short pilot jobs" has been found since 00:20:00 UTC on 2019/10/07.
  • Health checker info. : "Short pilot jobs" has been found since 11:20:00 UTC on 2019/10/03.(details)
  • Health checker info. : "Short pilot jobs" has been found since 04:20:00 UTC on 2019/10/03.

  • Health checker info. : "Short pilot jobs" has been found at 13:20:00 UTC on 2019/10/02. 

LCG.CYFRONET.pl

  • "Failed Payload Job" has been observed since 2019-12-23 10:28 UTC (for 12 hours)
  • The job status plot of B2PlotDisplay shows jobs stalled for two hours, as of 2019-12-23 10:35:17 UTC. Before this incident, there was no job at all on this site. 
  • "Short Pilot" has been observed since 2019-11-21 21:35 UTC (for 11 hours)
  • "Failed Payload Job" has been observed since 2019-11-21 21:35 UTC (for 11 hours)
  • Following JIRA tickets submitted: BIIDCO-2112 , BIIDCO-1920 , BIIDCO-1246
  • "Failed Payload Job" has been observed since 2019-11-20 13:35 UTC (for 1 hours)

  • "Short Pilot" has been observed since 2019-11-11 11:30 UTC (for 3 hours) 
  •  "Short Pilot" has been observed since 2019-11-10 05:30 UTC (for 1 hours)
  • Following JIRA tickets submitted: BIIDCO-1920 , BIIDCO-1246
  • Downtime:  2019-07-11 10:00  (UTC) -  2019-12-10 22:00 (UTC)
  • Health checker info. : "Short pilot jobs" has been found at 14:20:00 UTC on 2019/10/06.
  • Job submission check : Pilot submission failure has been found since 14:23:00 UTC on 2019/07/31.
  • Health checker info. : "Short pilot jobs" has been found since 13:20:00 UTC on 2018/12/13.

LCG.DESY.de

  • The site to be retired   – No more jobs to be submitted.

LCG.Frascati.it

  •  Site is currently Banned due to hardware problem since 2019-07-05

  • Downtime 2019-12-15 06:41 UTC - 2019-12-16 07:41 UTC:
  • Downtime 2019-12-16 07:30 (UTC) - 2019-12-18 17:00 (UTC)

  • Job submission check : Pilot submission failure has been found since 14:24:00 UTC on 2019/05/24. 
    • GGUS 141688 ticket submitted.
  • Health checker info. : "BLAH ERROR" has been found since 15:20:00 UTC on 2019/05/21.(details)

LCG.HEPHY.at

  • "Failed Payload Job" has been observed since 2020-01-05 05:34 UTC (for 2 hours)
  • Health checker info. : "Failed pilot jobs" has been found at 13:20:00 UTC on 2019/10/03.(details)
  • Health checker info. : "Failed pilot jobs" has been found at 15:20:00 UTC on 2019/05/22.(details)
  • Health checker info. : "Short pilot jobs" has been found at 15:20:00 UTC on 2019/04/12.
  • Health checker info. : "Failed pilot jobs" has been found at 02:20:00 UTC on 2019/01/30.(details) and at 02:20:00 UTC on 2019/01/31.(details)
  • submission check : Pilot submission failure has been found at 14:22:00 UTC on 2018/12/27.

LCG.IPHC.fr

  • "Failed Payload Job" has been observed since 2020-01-04 20:34 UTC (for 11 hours)
  • Following JIRA tickets submitted: BIIDCO-2191
  • Downtime:
  • Health checker info. : "Failed pilot jobs" has been found at 00:20:00 UTC on 2018/06/18.(details)

LCG.KEK.jp

  • Job status: Large number of jobs finished with errors (61.0%) in last 24 hour period, from approx. 2020-01-01 00:00 - 02:00 UTC
  • Downtime:

  • Health checker info. : "Short pilot jobs" has been found since 05:20:00 UTC on 2019/10/09

  • Health checker info. : "Short pilot jobs" has been found at 14:20:00 UTC on 2019/10/06.
  • SiteDirector "Failed to check the availability" 
  • Pilot job failure with Sandbox upload error

LCG.KEK2.jp

  • Downtime:
  • Health checker info. : "Short pilot jobs" has been found at 16:20:00 UTC on 2019/10/09.

  • Health checker info. : "Short pilot jobs" has been found at 14:20:00 UTC on 2019/10/06.
  • Health checker info. : "Short pilot jobs" has been found at 04:20:00 UTC on 2019/09/29.
  • Health checker info. : "Short pilot jobs" has been found at 01:20:00 UTC on 2019/08/28.
  • Still all jobs failing with InputDataResolution on 2019/07/25.
  • GGUS ticket : "KEK SE: PrepareToGet ETIMEDOUT for a specific file path"(140328) has been submited at 21:26:29 UTC on 2019/03/21.
  • Health checker info. : "Short pilot jobs" has been found at 11:20:00 UTC on 2019/03/22.(details)
  • Health checker info. : "Failed pilot jobs" has been found since 13:20:00 UTC on 2018/12/21.
  • all jobs are in "Input data resolution" status since 12.00 2018/12/18 UTC

LCG.KEK-merge.jp

  • Downtime:
  • Health checker info. : "Short pilot jobs" has been found at 00:20:00 UTC on 2019/08/26.
  •   Most jobs failing with InputDataResolution
  • "Belle II software could not be installed on cb268.cc.kek.jp" has been found since 14:20:00 UTC on 2019/04/05
  • Health checker info. : "Short pilot jobs" has been found since 20:20:00 UTC on 2019/04/02
  •   being commissioned...

LCG.KISTI.kr

  • Downtime:
  • "Short Pilot" has been observed since 2019-11-20 13:35 UTC (for 17 hours) 

  • Jobs slots are disabled for SE maintenance from 2018-10-19 to 2018-10-23

  • Health checker info. : "BLAH ERROR" has been found since 06:20:00 UTC on 2018/10/19.(details)

  • "Short pilot jobs" has been found at 06:20:00 UTC on 2018/10/09.(details)
  • BLAH error seems to be happen if jobs exceed the allocated # of queues, not a problem (Site specific feature)  
  • A large number of Merge jobs in waiting status

LCG.KMI.jp

  • "Short Pilot" has been observed since 2020-01-05 02:34 UTC (for 5 hours)
  • "Failed Payload Job" has been observed since 2020-01-04 22:34 UTC (for 9 hours)
  • Following JIRA tickets submitted: BIIDCO-2112 , BIIDCO-1533
  • "Failed Payload Job" has been observed since 2019-11-19 12:35 UTC (for 2 hours)
  • "Short Pilot" has been observed since 2019-11-11 11:30 UTC (for 3 hours) 
  • Health checker info. : "Short pilot jobs" has been found at at 22:20:00 UTC on 2019/04/08 and at 15:20:00 UTC on 2019/04/12.
  • Job status plots, 100% "Application Finished With Errors", 10:00:00 on 2019/04/08
  • Job submission check : Pilot submission failure has been found since 21:25:00 UTC on 2019/02/01. 

  • Job status check: Application finished with errors (7% of the jobs in last 24 hours) on 2018/12/21 at 8:48 UTC.
  • Health checker info. : "Short pilot jobs" has been found at 14:20:00 UTC on 2018/12/13.(details)
  • Health checker info. : "Belle II software could not be installed on pwn22.local" has been found since 21:20:00 UTC on 2018/11/22.
  • Job submission check : Pilot submission failure has been found since 21:24:00 UTC on 2018/10/02. (details)

LCG.LAL.fr

  • 2020-01-02 13:20 UTC File transfer failure from KMI-TMP-SE and from KEK-Disk-TMP-SE to LAL-DATA-SE
  • 2020-01-02 09:15 UTC File transfer failure from KMI-TMP-SE to LAL-DATA-SE
  • Downtime:
  • Downtime 2019-12-02 15:00 - 2019-12-06 15:00 (UTC) 
  • Downtime 2019-11-21 09:00-12:00 (UTC) 
  • "Failed Payload Job" has been observed since 2019-11-18 05:35 UTC (for 2 hours)
  • Health checker info. : "Short pilot jobs" has been found at 06:20:00 UTC on 2019/05/01.(details)

LCG.Legnaro.it

  • Date, Issue, Tickets...

LCG.Napoli.it

  • "Pilot Submission Failure" has been observed since 2020-01-18 00:53 UTC (for 22 hours)
  • schedule downtime 2020-01-15 11:00 - 2020-01-15 12:00 
  • "Failed Payload Job" has been observed since 2020-01-04 20:34 UTC (for 11 hours)
  • Following JIRA tickets submitted: BIIDCO-2140
  • "Failed Payload Job" has been observed since 2020-01-04 02:34 UTC (for 4 hours)
  • Following JIRA tickets submitted: BIIDCO-2140
  • "BLAH Error" has been observed since 2019-12-25 22:27:58 UTC (for 7 hours)

  • "Aborted Pilot" has been observed since 2019-12-02 09:35 UTC (for 5 hours)
  • Downtime 2019-11-12 13:00 - 2019-11-19 19:00 (UTC)
  • "Pilot Submission Failure" has been observed since 2019-11-16 06:35 UTC (for 32 hours)
  • Health checker info. : "Failed pilot jobs" has been found at 14:20:00 UTC on 2019/10/06.
  • Job submission check : Pilot submission failure has been found since 12:27:00 UTC on 2019/10/02.

  •  t2-recas-ce01.na.infn.it shows pilot submission error and this CE should  be banned till 2019 September.

  • Stalled jobs

LCG.NTU.tw

  • "BLAH Error" has been observed since 2020-01-22 22:52:36 UTC (for 1 hours)
  • "Pilot Submission Failure" has been observed since 2020-01-14 17:53 UTC (for 13 hours) 
  • "Failed Payload Job" has been observed since 2020-01-05 01:34 UTC (for 6 hours)
  • "Pilot Submission Failure" has been observed since 2019-12-26 09:34 UTC(for 5 hours) 
    Solved and verified GGUS ticket https://ggus.eu/index.php?mode=ticket_info&ticket_id=144666 has been submitted
  • "Pilot Submission Failure" has been observed since 2019-12-24 15:28 UTC (for 7 hours)
  • "Pilot Submission Failure" has been observed since 2019-12-23 20:28 UTC (for 2 hours)(details).
  • "Failed Payload Job" has been observed since 2019-12-23 21:28 UTC (for 1 hours)
  • According to the JobStatus of B2PlotDisplay, jobs were finished with errors as of 2019-12-23 13:32:55 for 6 hours.
  • "CRL has expired" for "node35-0,node19-0,node36-0,node33-0,node38-0,node37-0,node39-0" has been observed since 2019-12-18 23:27:52 UTC (for 6 hours)  

  • Solved and verified: GGUS ticket https://ggus.eu/index.php?mode=ticket_info&ticket_id=144612
  • "Failed Payload Job" has been observed since 2019-12-10 05:28 UTC (for 1 hours)
  • Downtime  2019-12-06 08:00 (UTC)  - 2019-12-09 04:00 (UTC)  
  • 2019-11-27 Downtime 2019-11-22 20:00 (UTC) - 2019-11-29 20:00 (UTC)
  • "CRL has expired" for "node38-0,node32-0,node33-0,node37-0,node39-0,node34-0,node35-0" has been observed since 2019-11-17 14:30:48 UTC (for 53 hours)


  • Solved and verified: GGUS ticket https://ggus.eu/?mode=ticket_info&ticket_id=144166

  •  "Short Pilot" has been observed since 2019-11-10 04:30 UTC (for 2 hours)

  • "BLAH ERROR" has been found since 07:20:00 UTC on 2019/09/13
  • "Short pilot jobs" has been found since 07:20:00 UTC on 2019/09/13.(details)
  • "Short pilot jobs" has been found at 00:20:00 UTC on 2019/07/08. (screenshot)
  • Health checker info. : "Short pilot jobs" has been found at 06:20:00 UTC on 2019/04/18
  • Downtime 2020-01-10 09:00  - 2020-01-13 05:00   (UTC)  

LCG.Pisa.it


  • "Short pilot jobs" has been found since 02:20:00 UTC on 2018/09/21.(details)

LCG.Roma3.it

  • Health checker info. : "Failed pilot jobs" has been found at 14:20:00 UTC on 2019/04/20.

LCG.TAU.il

  • "Failed Payload Job" has been observed since 2020-01-05 00:34 UTC (for 7 hours)
  • Following JIRA tickets submitted: BIIDCO-2187 , BIIDCO-2112 ,
  • Downtime
  • "Failed Pilot" has been observed since 2020-01-09 13:53 UTC (for 9 hours) 
  • "Short Pilot" has been observed since 2019-11-22 02:35 UTC (for 5 hours)
  • "Short Pilot" has been observed since 2019-11-11 12:30 UTC (for 2 hours)

  • Health checker info. : "Short pilot jobs" has been found at 14:20:00 UTC on 2019/10/06.
  • Health checker info. : "Failed pilot jobs" has been found at 23:20:00 UTC on 2019/05/29.(details)
  • Health checker info. : "Failed pilot jobs" has been found since 19:20:00 UTC on 2019/05/24.(details)

LCG.Torino.it

  • "Pilot Submission Failure" has been observed since 2020-01-22 21:53 UTC (for 1 hours)
  • "Pilot Submission Failure" has been observed since 2020-01-05 02:34 UTC (for 5 hours)
  • Following JIRA tickets submitted: BIIDCO-2215
  • "Failed Payload Job" has been observed since 2020-01-04 04:34 UTC (for 2 hours)
  • Following JIRA tickets submitted: BIIDCO-2215
  • "Pilot Submission Failure" has been observed since 2019-12-28 18:34 UTC 
  • Job status check: 4.5% of jobs identified as stalled by the watchdog since 2019-12-25 22:00:00 UTC (for 12 hours)
  • "Pilot Submission Failure" has been observed since 2019-12-23 13:28 UTC (for 9 hours)
  • "Short Pilot" has been observed since 2019-11-10 05:30 UTC (for 1 hours) 
  • Job submission check : Pilot submission failure has been found at 13:26:00 UTC on 2019/05/20. (details)
  • Job submission check : Pilot submission failure has been found at 14:25:00 UTC on 2019/05/11. (details)
  • Job submission check : Pilot submission failure has been found at 14:25:00 UTC on 2019/05/09

  • Job submission check : Pilot submission failure has been found at 06:25:00 UTC on 2019/03/23.

LCG.ULAKBIM.tr

  • schedule downtime fro 2020-01-14 07:00 to 2020-01-16 13:00 (UTC)
  • The queue 'belle7' to be disabled. use only 'belle'
  • Health checker info. : "Aborted pilot jobs" has been found since 01:20:00 UTC on 2019/08/01.

OSG.BNL.us

  • "Failed Payload Job" has been observed since 2020-01-04 20:34 UTC (for 11 hours)
  • Job status: 81.0% of jobs finished with error between 2020-01-01 02:00 - 2020-01-02 02:00 UTC.
  • "Failed Payload Job" has been observed since 2019-12-20 14:28 UTC 
    Solved and verified GGUS ticket : https://ggus.eu/index.php?mode=ticket_info&ticket_id=144665 has  been submitted
  • "Belle II software could not be installed" for "bgk01.sdcc.bnl.gov" has been observed since 2019-12-18 15:27:52 UTC
  • All pilot submission failure to bgk02.sdcc.bnl.govsince 2019-12-11
  • "Pilot Submission Failure" has been observed since 2019-12-05 05:28 UTC
  • "Short Pilot" has been observed since 2019-11-21 21:35 UTC

  • "Short Pilot" has been observed since 2019-11-11 10:30 UTC (for 4 hours)  
  • Health checker info. : "Short pilot jobs" has been found since 13:20:00 UTC on 2019/08/01.
  • Health checker info. : "Belle II software could not be installed on " has been found since 19:20:00 UTC on 2019/02/14.
  • Job submission check: Jobs fail with errors or input data resolution the last 24h (6:00 UTC, 2019/01/09) 
  • Production jobs: UNAVAILABLE files
  • Number of concurrent MCProduction jobs restricted
  •  MCProduction jobs are mostly stalled

OSG.CORI.us

  • OSG.CORI.us resource has been removed because CY18 allocation was not approved

OSG.UMiss.us

  • "Short Pilot" has been observed since 2020-01-14 11:53 UTC (for 28 hours)
    "Short Pilot" has been observed since 2020-01-12 22:53 UTC (for 32 hours)

  • "Short Pilot" has been observed since 2019-11-22 05:35 UTC (for 3 hours)
  • Following JIRA tickets submitted: BIIDCO-2112 , BIIDCO-1863 , BIIDCO-1856 , BIIDCO-1768
  • "Short Pilot" has been observed since 2019-11-21 04:35 UTC (for 2 hours) (details).
  • "Short Pilot" has been observed since 2019-11-11 11:30 UTC (for 3 hours) 
  • Health checker info. : "Short pilot jobs" has been found since 21:20:00 UTC on 2019/07/10. (screenshot)
  • Health checker info. : "Short pilot jobs" has been found since 21:20:00 UTC on 2019/07/08. (screenshot)
  • Health checker info. : "Short pilot jobs" has been found at 06:20:00 UTC on 2019/07/03.
  • Health checker info. : "Short pilot jobs" has been found since 09:20:00 UTC on 2019/06/27
  • Health checker info. : "Short pilot jobs" has been found since 17:20:00 UTC on 2019/06/04.(details)
  • Health checker info. : "Short pilot jobs" has been found at 23:20:00 UTC on 2019/06/03.(details)
  • Health checker info. : "Aborted pilot jobs" has been found at 22:20:00 UTC on 2019/06/02.
  • Health checker info. : "Short pilot jobs" has been found at 14:20:00 UTC on 2019/05/20.(details)
  • Job status check: 100% of issues of Input Data Resolution on 2019/05/14 at 7:00 UTC.
  • Health checker info. : "Short pilot jobs" has been found at 07:20:00 UTC on 2019/05/14.(details)
  • Health checker info. : "Short pilot jobs" has been found since 22:20:00 UTC on 2019/05/12.(details)
    Updated
  • Job submission check : Pilot submission failure has been found since 12:27:00 UTC on 2019/05/04. (details)
  • Health checker info. : "Short pilot jobs" has been found since 07:20:00 UTC on 2019/05/11.(details)
  • Health checker info. : "Short pilot jobs" has been found since 20:20:00 UTC on 2019/05/01.(details)
    Health checker info. : "Short pilot jobs" has been found since 21:20:00 UTC on 2019/04/30.(details)
    Health checker info. : "Short pilot jobs" has been found since 04:20:00 UTC on 2019/04/29.(details)
    Health checker info. : "Short pilot jobs" has been found since 20:20:00 UTC on 2019/04/22.
    Health checker info. : "Short pilot jobs" has been found at 14:20:00 UTC on 2019/04/16.

  • Health checker info. : "Short pilot jobs" has been found since 14:20:00 UTC on 2019/04/11 and  at 17:20:00 UTC on 2019/04/14.
  • Job status check: 34.7% appl. finshed with errors on 2019/04/08.

SSH.KMI.jp

  • "Short Pilot" has been observed since 2019-12-24 08:28 UTC (for 5 hours)  
  • Job status plot: input data resolution problems (for 7 hours) since 2019-12-24 00:00 UTC, approximately.
  • "Short Pilot" has been observed since 2019-12-24 05:28 UTC (for 1 hours)
  • Job status check: Application finished with errors (12% of the jobs in last 24 hours) on 2018/12/22 at 11:30 UTC.
  • Health checker info. : "Short pilot jobs" has been found at 20:20:00 UTC on 2018/08/13.

Test.KIT.de

  • "Failed Pilot" has been observed since 2020-01-19 21:53 UTC (for 1 hours)
  • "Aborted Pilot" has been observed since 2020-01-16 21:53 UTC (for 1 hours)
  • Downtime on 2020-01-16 09:00 to 2020-01-16 09:30
  • Downtime on 2020-01-15 13:00 to 2020-01-15 15:00 
  • Downtime on 2019-01-13  8:00 UTC  to 2020-01-13 16:00 UTC 
  • "Failed Payload Job" has been observed since 2020-01-04 20:34 UTC (for 11 hours)
  • Following JIRA tickets submitted: BIIDCO-2140
  • Downtime on 2019-12-11  8:00 UTC  to 2020-01-24 22:00 UTC 
  • "Pilot Submission Failure" has been observed since 2019-12-05 05:28 UTC 
  • "Aborted Pilot" has been observed since 2019-11-23 05:35 UTC (for 1 hours)
  • Test site for the opportunistic resources at KIT. No need to report problems.LCG.Pisa.it
  • Downtime 2020-01-09 14:38 - 2020-01-10 16:00 UTC. 

Test.ULAKBIM.tr

  • schedule downtime from 2020-01-14 07:00 to 2020-01-16 13:00 (UTC)
  • Test site for the SL7 resources at ULAKBIM. No need to report problems.
  • No activities expected currently.

VCYCLE.Napoli.it

  • "Failed Payload Job" has been observed since 2020-01-05 04:34 UTC (for 3 hours)
  • Following JIRA tickets submitted: BIIDCO-2143 , BIIDCO-1613 , BIIDCO-1612
  • "Failed Payload Job" has been observed since 2019-11-21 20:35 UTC (for 12 hours)
  • "Failed Payload Job" has been observed since 2019-11-29 00:35 UTC (for 6 hours)

  • Following JIRA tickets submitted: BIIDCO-1613 , BIIDCO-1612
  • "Failed Payload Job" has been observed since 2019-11-20 13:35 UTC (for 1 hours)

  • Opportunistic site (Empty plot is not a problem)
  •  Ban lifted
  • "Sudo CE Error: sudo execution fails with return code 1"

VCYCLE.HNSC01.it, VCYCLE.HNSC02.it

  • Opportunistic site (Empty plot is not a problem)


Links