CESNET-TMP-SE


Contents

  • l




Production Plans

  • MC12 - mostly finished. A few jobs in not assigned status.
  • proc9 - Done.
  • The Phase III data will also be skimmed, as soon as proc9 is ready on the grid.
  • More realistic beam background overlay files (simulated) are in preparation (). Once these are prepared, we will plan to have a new MC campaign. Run-dependent MC and a few data samples will also be processed, once the appropriate GT is finalized.

Production Status


Date: 2019-10-
No additional news from Production team. A few production jobs are submitted and skim/merge jobs only processed.

Date: 2019-09-03
Not mentioned about skimming. Assume situation is the same as reported in 8/12.

The location of the samples at KEKCC and on the grid are detailed on the Phase 3 page: https://confluence.desy.de/display/BI/Phase+3+data. The samples are separated into proc9, which includes everything up to exp 8, run 1835 (including the off-resonance sample), and bucket7, which includes exp 8 runs 1836-3123. 

The next reprocessing, proc10, will use release-04 (pending release next month) and will feature significant updates for some calibration code.

Beam backgrounds and validation samples:

New beam background overlay files for release-04 have been prepared. Details are available at https://confluence.desy.de/display/BI/Beam+background+samples.

The DP group is producing validation samples for release-04. Details are given in https://agira.desy.de/browse/BII-5150 and the associated sub-tickets.

MC production:

New, run-dependent MC samples, approximately corresponding to proc9, have been produced. Details are available on https://confluence.desy.de/display/BI/Data+Production+Run+Dependent+MC. It is important to note that beam background overlay samples are only available for a subset of runs and only for the pseudorandom trigger (not delayed Bhabhas). It is also the case that not all detectors are ready for run-dependent MC (a goal for release-04). A total of approximately 10/fb equivalent generic samples will be produced, along with signal sample upon request.

Most MC12 samples are now finished. Additional signal requests will be accepted, but jobs are not likely to saturate the system for a while, at least.

Therefore, only limited number of sites are expected to run jobs

Analysis skimming (uDSTs)

The first skim campaign for MC12 is now complete. Details of available analysis skims are given at  https://confluence.desy.de/display/BI/MC12+Skim+Production. The second MC12 skim campaign, which will include charm and semileptonic FEI skims, is currently under preparation and will be processed soon.  The Phase III data will also be skimmed, as soon as proc9 is ready on the grid.



Central Services

Dirac (dirac.cc.kek.jp, b2dchsv01-b2dchsv06.cc.kek.jp, b2dchsv08.cc.kek.jp)

  • Date, Issue, Tickets...

DB Production (b2dchdb1.cc.kek.jp, b2dchdb2.cc.kek.jp, b2dcsdb1.cc.kek.jp, b2dcsdb2.cc.kek.jp)

  • Date, Issue, Tickets...


DDM (bldirac01.sdcc.bnl.gov)

  • 2018-03-01 DDM deletion task seems stuck

Conditions DB ()

Monitor

  • Issue in access to DIRAC Web Portal

LFC

  • Date, Issue, Tickets...

File Transfers and Replication Status

  • 2019-10-06  16:30 UTC: No Transfer Activity during the last 12 hours
  •  Efficiency for Destination is less than 20%.
  • Efficiency for Source is less than 20%.
  • No throughput and very low number of successful transfers. https://agira.desy.de/browse/BIIDCO-1968
  • See also Computing OperationStatus#DDM for related issues
  • 2019-10-16 File transfer failures:

FTS

  • Any problem in the FTS service or FTS monitoring are to be recorded here. Site/SE specific issues are to be recorded under each SIte/SE
  • Note that the FTS dashboard we use is an "old" instance and not well-maintained. We, Belle II members in general, do not have access to the "new" monitoring. When the dashboard is down, the shifters just need to notify the expert and skip the corresponding part of their work. The expert should check the new monitoring, for the access to the monitoring page is limited.
  • 2019-08-31  File transfer failures for past 48 hours. 
  • 2019-09-03 File transfer failures for past 24 hours. 

Replication Status

  •   Replication status: Zero 'done' at most of the plots with non-zero 'queued' and/or 'waiting'.       
  • 2019-1-19 almost zero done, with a increasing numbers of scheduled jobs for more than 5 SEs and more than 5 hours.
  • 2018-07-02   No Donetransfer,  several scheduled and rapid increase of Waiting replication

Job Status Plot

  • Date, Issue, Tickets...
  • There are no jobs since 2019-10-04 00:00

Job Summary

  • Date, Issue, Tickets...

SEs

SE Common Issues

  • Issues with individual SEs should be recorded below (Primary SEs or Other SEs)

Raw data SEs

Raw data SE: KEK-RAW-SE (srm://kek2-se02.cc.kek.jp:8444/srm/managerv2?SFN=/belle/RAW)

  • 2019-07-24 15:30 UTC: all transfers failed (0/72) between KEK-RAW-SE and BNL-TMP-SE

Raw data SE: BNL-TAPE-SE (srm://dcblsrm.sdcc.bnl.gov:8443/srm/managerv2?SFN=/pnfs/sdcc.bnl.gov/tape)

  • date, issue, tickets

Primary SEs

Primary SE: BNL-TMP-SE (dcblsrm.sdcc.bnl.gov)

  • Occasional file access failure
    Solved and verified :   GGUS ticket https://ggus.eu/?mode=ticket_info&ticket_id=143701 has submitted
  • SE Health Check by DDM: Failure on download have been observed since 2019-10-17 16:05:58
  •  GGUS ticket https://ggus.eu/?mode=ticket_info&ticket_id=142410 has submitted

  • SE Health check by DDM : download does not work since 2019-05-16 07:11:21 UTC.

  •  UNAVAILABLE files
  • SE Health check by DDM : download does not work since 2019-05-15 21:03:25 UTC

Primary SE: CESNET-TMP-SE (dpm1.egee.cesnet.cz)

  • SE Health Check by DDM: Failure on upload have been observed since 2019-10-07 07:57:05 UTC

  • Solved and verified at 2019-10-15 GGUS ticket https://ggus.eu/?mode=ticket_info&ticket_id=143518 has submitted

  • SE Health check by DDM : remove file, remove directory, ls do not work since 2019-07-10 06:32:47 UTC.

Primary SE: CNAF-TMP-SE (storm-fe-archive.cr.cnaf.infn.it)

  • SE Health check by DDM : remove file does not work since 2019-05-13 08:27:21 UTC.
  • SE Health check by DDM : remove file, remove directory, download, upload, ls do not work since 2019-04-25 23:13:00 UTC.
  • 2019/04/11: File transfer failures from NTUCC_DATA_SE to CNAF-TMP-SE, Updated
  • 2019/01/27 File transfer failures from CNAF-TMP-SE to NTUCC-DATA-SE.
  •  Cotinuous timeout failure between NTU-CC-TMP-SE and CNAF-TMP-SE

Primary SE: DESY-TMP-SE (dcache-se-desy.desy.de)

  • Date, Issue, Tickets...

Primary SE: KEK-DISK-TMP-SE (srm://kek2-se03.cc.kek.jp:8444/srm/managerv2?SFN=/disk/belle/TMP)

  • SE Health check by DDM : remove file, ls do not work since 2019-10-01 22:02:44 UTC. 
     GGUS ticket https://ggus.eu/index.php?mode=ticket_info&ticket_id=143448 has submitted at 2019-10-02 01:22
  • SE Health Check by DDM: Failure on ls, remove file, remove directory have been observed since 2019-10-01 08:48:34
  • SE Health check by DDM : download, upload do not work since 2019-09-28 10:53:28 UTC.

Primary SE: KEK2-TMP-SE (srm://kek2-se03.cc.kek.jp:8444/srm/managerv2?SFN=/belle/TMP)

  • SE Health check by DDM : ls does not work since 2019-10-01 22:29:59 UTC. 
       Solved and verified : GGUS ticket https://ggus.eu/index.php?mode=ticket_info&ticket_id=143448 has submitted at 2019-10-02 01:22
  •  
  • SE Health Check by DDM: Failure on ls, remove file, remove directory have been observed since 2019-10-01 09:26:01

  • SE Health check by DDM : download, upload do not work since 2019-09-28 10:55:40 UTC.

Primary SE: KISTI-TMP-SE (belle-se-head.sdfarm.kr)

Primary SE: KIT-TMP-SE (dcachesrm-kit.gridka.de)

Primary SE: KMI-TMP-SE (nsrmfe01.hepl.phys.nagoya-u.ac.jp )

Primary SE: Napoli-TMP-SE (belle-dpm-01.na.infn.it )

  • Date, Issue, Tickets...

Primary SE: SIGNET-TMP-SE (dcache.ijs.si )

  • Date, Issue, Tickets...

Other SEs

Adelaide-TMP-SE (coepp-dpm-01.ersa.edu.au)

  • Date, Issue, Tickets...

CYFRONET-TMP-SE (dpm.cyf-kr.edu.pl)

  • Date, Issue, Tickets...

CINVESTAV-TMP-SE (jaguar-se.fis.cinvestav.mx)

  • Date, Issue, Tickets...

Frascati-TMP-SE (atlasse.lnf.infn.it)

  • Date, Issue, Tickets...

HEPHY-TMP-SE (hephyse.oeaw.ac.at)

  • Date, Issue, Tickets...

IPHC-TMP-SE (sbgse1.in2p3.fr)

  • Date, Issue, Tickets...

LAL-TMP-SE (grid05.lal.in2p3.fr)

  • Date, Issue, Tickets...

Melbourne-TMP-SE (b2se.mel.coepp.org.au)

  • transfer rate to be zero

  • Melbourne-DATA-SE banned for write

McGill-TMP-SE  (storm02.clumeq.mcgill.ca)

  • McGill-TMP-SE will be decomissioned in early 2018.

MPPMU-TMP-SE (grid-srm.rzg.mpg.de)


NTU-TMP-SE (bgrid3.phys.ntu.edu.tw)

  •  NTU-TMP-SE banned for write 
  • 2019-08-31  File transfer failures for past 48 hours. 


NTU-CC-TMP-SE (belle2grid3.cc.ntu.edu.tw)

  • 2019-10-06 File transfer failures for past 24 hours. 
  • 2019/8/23 file transfer failure to NTU-CC-DATA-SE
  • FTS transfer failure as SOURCE NTU-CC-DATA-SE to BNL-TMP-SE

    GGUS ticket https://ggus.eu/index.php?mode=ticket_info&ticket_id=142550 has submitted
  • 2019/01/27 File transfer failures from CNAF-TMP-SE to NTUCC-DATA-SE. 
  • File transfer failure and cancellation to NTUCC-DATA-SE happened 2018-12-22
  • Frequent timtout has observed between NTU-CC-TMP-SE and CNAF-TMP-SE
    GGUS ticket https://ggus.eu/index.php?mode=ticket_info&ticket_id=137334 has submitted 2018-09-22 05:10 UTC
  • NTUCC-TMP-SE banned for write 

Pisa-TMP-SE (stormfe1.pi.infn.it)

  • Date, Issue, Tickets...

PNNL-TMP-SE (se.hep.pnnl.gov) 

  • Being decommissioned. No need to report any issues.
    •  

Roma3-TMP-SE (storm-01.roma3.infn.it)

  •  Date, Issue, Tickets...

TAU-TMP-SE (tau-se.hep.tau.ac.il)

  • Date, Issue, Tickets...

Torino-TMP-SE (se-srm-00.to.infn.it)

  • Date, Issue, Tickets...

ULAKBIM-TMP-SE (torik1.ulakbim.gov.tr)

UMiss-TMP-SE (umiss005.hep.olemiss.edu)


UVic-TMP-SE(charon01.westgrid.ca)


Sites

Sites Common Issue

  •  Pilot jobs are not submitted to DIRAC SSH sites since 2019-09-29 5:00 UTC

ARC.DESY.de

  • Health checker info. : "Short pilot jobs" has been found since 06:20:00 UTC on 2019/10/09.
  • Health checker info. : "Short pilot jobs" has been found since 20:20:00 UTC on 2019/10/02. 
  • Health checker info. : "Short pilot jobs" has been found at 00:20:00 UTC on 2019/08/29.(details)

ARC.DESY-test.de

  • A test queue for the new CE.

ARC.KIT.de

  • Health checker info. : "Short pilot jobs" has been found since 09:20:00 UTC on 2019/10/09.
  • Health checker info. : "Short pilot jobs" has been found at 21:20:00 UTC on 2019/10/02. 
  • Health checker info. : "Short pilot jobs" has been found at 22:20:00 UTC on 2019/09/24.

ARC.LMU.de

  • This is a test site. Do not need to report any issue.

ARC.LMU2.de

  • Banned as currently no resource behind the CE

ARC.Melbourne.au


ARC.MPPMU.de

  •  
  • "Failed Payload Job" has been observed since 2019-09-30 14:27 UTC (for 8 hours)
  • "Failed Payload Job" has been observed since 2019-09-29 15:27 UTC (for 7 hours)
  • Job submission check : Pilot submission failure has been found since 00:26:00 UTC on 2019/04/21.

ARC.SIGNET.si

  • Health checker info. : "Short pilot jobs" has been found since 13:20:00 UTC on 2019/10/09.
  • Health checker info. : "Short pilot jobs" has been found at 14:20:00 UTC on 2019/10/06.
  • Health checker info. : "Short pilot jobs" has been found since 21:20:00 UTC on 2019/10/02. 
  • Health checker info. : "Failed pilot jobs" has been found at 22:20:00 UTC on 2019/09/24.
  • Health checker info. : "Short pilot jobs" has been found at 22:20:00 UTC on 2019/09/24.

  • Health checker info. : "Not enough disk space on " has been found since 21:20:00 UTC on 2019/09/22. (details)
  • Health checker info. : "Failed pilot jobs" has been found since 20:20:00 UTC on 2019/08/28.(details)
  • Health checker info. : "Short pilot jobs" has been found since 13:20:00 UTC on 2019/08/01.
  • Health checker info. : "Aborted pilot jobs" has been found since 01:20:00 UTC on 2019/06/03
  • "Short pilot jobs" has been found since 10:20:00 UTC on 2019/05/27.(details)
  • "Failed pilot jobs" has been found at 15:20:00 UTC on 2019/05/22.(details)
  • "Short pilot jobs" has been found at 15:20:00 UTC on 2019/05/22.(details)
  • Health checker info. : "Short pilot jobs" has been found at 14:20:00 UTC on 2019/05/21.(details)
  • Job status check: many Stalled jobs on 2019/05/14 at 7:00 UTC.
  • Health checker info. : "Short pilot jobs" has been found at 07:20:00 UTC on 2019/05/14.(details)
  • Health checker info. : "Short pilot jobs" has been found at 15:20:00 UTC 2019/04/05 and at 14:20:00 UTC on 2019/04/12.
  • Job status check: Application finished with errors (5% of the jobs) at 11:15 UTC on 2018/12/21.

  • "Failed to install DIRAC on " has been found since 20:20:00 UTC on 2018/11/03.

  • Health checker info. : "Aborted pilot jobs" has been found since 20:20:00 UTC on 2018/10/20.
  • Health checker info. : "Failed pilot jobs" has been found since 06:20:00 UTC on 2018/10/03.(details)

CLOUD.CC1_Krakow.pl

  • Not used in production yet. Seeing no jobs (no plot) is not a problem

DIRAC.Beihang.cn

  • Job submission check : Pilot submission failure has been found since 04:21:00 UTC on 2019/07/04.
  • Health checker info. : "Short pilot jobs" has been found since 16:20:00 UTC on 2019/06/30.
  • "Failed Payload Job" has been observed since 2019-04-19 11:15 UTC  
  • Health checker info. : "Short pilot jobs" has been found at 06:20:00 UTC on 2019/04/18.
  • Health checker info. : "Short pilot jobs" has been found since 06:20:00 UTC on 2019/04/17.
  • Health checker info. : "Short pilot jobs" has been found at 06:20:00 UTC on 2018/12/08.
  • Job status check: "application finished with errors" (100% currently) on 2018/10/26.
  • Job submission check : Pilot submission failure has been found since 09:24:00 UTC on 2018/09/21. (details)
  • The number of jobs limited.
  • All the upload trials are failing against all the SEs configured: OutputSE (KMI-TMP-SE, PNNL-TMP-SE), Fail-over SEs(DESY-TMP-SE, Napoli-TMP-SE, PNNL-TMP-SE, KIT-TMP-SE)
  • Large % of failed jobs in DIRAC status plot (Added 2016-11-03 22:45:00 UTC) 
      •  "Application finished with errors" (100% currently) on 2019/04/10 00:15 UTC. Problem reported since (at least) 2019/04/07 07:00 UTC

DIRAC.BINP.ru

  •  
  • Job status check: "Application Finished With Errors" (39% of the jobs over the last 24h) at 7:00 UTC on 2019/05/15.
  • Job status check: Application finished with errors (27% of the jobs over the last 24h) at 8:00 UTC on 2018/12/22.
  • Health checker info. : "Failed to install DIRAC on " has been found at 22:20:00 UTC on 2018/09/15

DIRAC.BINP-VM.ru

  • Health checker info. : "Aborted pilot jobs" has been found at 06:20:00 UTC on 2019/02/21
  • Job submission check : Pilot submission failure has been found since 10:23:00 UTC on 2019/01/14.
  • Job status plots, "Application Finished With Errors" (2018-02-11 but lasting for at least a month)

DIRAC.CINVESTAV.mx

  • Health checker info. : "Short pilot jobs" has been found at 14:20:00 UTC on 2019/04/14.
  • Job submission check : Pilot submission failure has been found at 13:27:00 UTC on 2019/03/19.
  • Health checker info. : "Short pilot jobs" has been found at 15:20:00 UTC on 2018/12/06. 

DIRAC.DESY.de

  • Test site. Not in use in MC production

DIRAC.IITG.in

  • AID: "Aborted Pilot" has been observed since 2019-10-16 22:38 UTC (for 16 hours): JIRA ticket created 
  • Job submission check : Pilot submission failure has been found since 01:21:00 UTC on 2019/10/09
  • Job submission check : Pilot submission failure has been found since 12:21:00 UTC on 2019/10/07.
  • "Aborted pilot jobs" has been found since 12:20:00 UTC on 2019/07/10. (screenshot)
  • Job status check: "Application finished with errors" on 2019/07/10 at 00:00 UTC (screenshot)
  • Pilot submission failure has been found since 08:23:00 UTC on 2019/06/07. 
  • Health checker info. : "Short pilot jobs" has been found at 06:20:00 UTC on 2019/05/16.
  • Job status check: many "Application finished with errors" (overall 66% during past 24 hours) on 2019/05/15 at 7:00 UTC.
  • Job status check: many "Application finished with errors" on 2019/05/14 at 7:00 UTC.
  • Job status plots, 100% "Application Finished With Errors", 10:00:00 UTC on 2019/04/08. Still unchanged as of 2019/04/26. 
  • Health checker info. : "Aborted pilot jobs" has been found at 00:20:00 UTC on 2019/04/08.
  • Job status check: Application finished with errors (95% of the jobs over the last 24h) at 8:00 UTC on 2018/12/22.
  • Job status check: Input Data Resolution issues (100% of the jobs) on 2018/12/21 at 8:48 UTC.

DIRAC.IITH.in

  • Health checker info. : "Short pilot jobs" has been found since 15:20:00 UTC on 2019/09/24.
  • "Short pilot jobs" has been found since 15:20:00 UTC on 2019/06/03.(details)
  • "Aborted pilot jobs" has been found at 22:20:00 UTC on 2019/06/03.(details)
  • Health checker info. : "Aborted pilot jobs" has been found at 22:20:00 UTC on 2019/06/02.(details)
  • Health checker info. : "Short pilot jobs" has been found at 14:20:00 UTC on 2019/05/11.(details)
  • Health checker info. : "Short pilot jobs" has been found at 14:20:00 UTC on 2019/04/04
  • Health checker info. : "Short pilot jobs" has been found at 07:20:00 UTC on 2019/03/29.(details

DIRAC.LMU.de

  • Not in use in MC production
  • Banned for now.

DIRAC.MIPT.ru

  • Health checker info. : "Short pilot jobs" has been found since 12:20:00 UTC on 2019/05/25.(details)
  • Health checker info. : "Aborted pilot jobs" has been found since 13:20:00 UTC on 2019/04/20.
  • Health checker info. : "Aborted pilot jobs" has been found since 11:20:00 UTC on 2019/04/06 and since 05:20:00 UTC on 2019/04/12. and since 20:20:00 UTC on 2019/04/17.
  • Health checker info. : Short pilot jobs" has been found at 23:20:00 UTC on 2019/04/10 and 15:20:00 UTC on 2019/04/14.
  • Health checker info. : "Short pilot jobs" has been found at 07:20:00 UTC on 2019/03/29.(details

DIRAC.Nagoya.jp

  • Health checker info. : "Short pilot jobs" has been found since 09:20:00 UTC on 2019/10/09.
  • Date, Issues, tickets...

DIRAC.Nara-WU.jp

  • Job submission check : Pilot submission failure has been found at 06:23:00 UTC on 2019/10/03. 
  • Under commisioning from 2018-11-13

DIRAC.NDU.jp

  • Date, Issues, tickets...
  • "Short pilot jobs" has been found since 21:20:00 UTC on 2019/10/01. 

DIRAC.Niigata.jp

  • Health checker info. : "Short pilot jobs" has been found at 15:20:00 UTC on 2019/10/09.
  • "Short pilot jobs" has been found since 18:20:00 UTC on 2019/10/01. 
  • Health checker info. : "Short pilot jobs" has been found at 14:20:00 UTC on 2019/09/24.
  • Job submission check : Pilot submission failure has been found since 19:26:00 UTC on 2019/05/26. (details)
  • Health checker info. : "Aborted pilot jobs" has been found since 12:20:00 UTC on 2019/05/18.
  • Job submission check : Pilot submission failure has been found since 13:30:00 UTC on 2019/05/14. (details)
  • Health checker info. : "Aborted pilot jobs" has been found at 06:20:00 UTC on 2019/04/21.

DIRAC.Niigata2.jp

DIRAC.Osaka-CU.jp

  • Job submission check : Pilot submission failure has been found since 14:26:00 UTC on 2019/06/03. (details)
  • Job submission check : Pilot submission failure has been found since 06:21:00 UTC on 2019/04/02
  • Job submission check : Pilot submission failure has been found since 06:21:00 UTC on 2019/04/02.
  • Job submission check : Pilot submission failure has been found since 07:23:00 UTC on 2018/12/04. (details)
  • Health checker info. : "Short pilot jobs" has been found since 22:20:00 UTC on 2018/03/17.
    → Ask site admin to check the status 2018-03-17 10:00 JST. (DB access failure again from DIRAC.Osaka-CU.jp to PNNL from 2018-03-16 11:00 UTC)

DIRAC.PNNL.us

  • Site to be decommissioned

DIRAC.PNNL2.us

  • Site to be decommissioned

DIRAC.PNNL-CASCADE.us

  • Seeing no jobs (no plot) is not a problem

DIRAC.PNNL-PIC.us

  • Seeing no jobs (no plot) is not a problem

DIRAC.RCNP.jp

  • Health checker info. : "Short pilot jobs" has been found since 09:20:00 UTC on 2019/10/09
  • "Short pilot jobs" has been found since 19:20:00 UTC on 2019/10/01. 

DIRAC.SSU.kr

  • Health checker info. : "Short pilot jobs" has been found at 13:20:00 UTC on 2019/10/03.(details)
  • Health checker info. : "Short pilot jobs" has been found since 04:20:00 UTC on 2019/10/03.
  • Health checker info. : "Short pilot jobs" has been found at 22:20:00 UTC on 2019/10/02. 
  • Health checker info. : "Short pilot jobs" has been found since 23:20:00 UTC on 2019/08/27
  • "Short pilot jobs" has been found at 01:20:00 UTC on 2019/08/26. 
  • Health checker info. : "Short pilot jobs" has been found at 04:20:00 UTC on 2019/08/21.(details)
  • Health checker info. : "Short pilot jobs" has been found at 23:20:00 UTC on 2019/08/21.(details)

DIRAC.TIFR.in

  • Health checker info. : "Short pilot jobs" has been found at 14:20:00 UTC on 2019/10/06.
  • Job submission check : Pilot submission failure has been found at 14:25:00 UTC on 2019/05/11. (details)
  • Health checker info. : "Short pilot jobs" has been found since 14:20:00 UTC on 2019/05/10.(details)
  • Health checker info. : "Short pilot jobs" has been found since 13:20:00 UTC on 2018/10/22.
  • Job status plots, "Application Finished With Errors" has been found at about 00:00:00 JST on 2018/07/06. (details)
  • Health checker info. : "Short pilot jobs" -- Already reported: 
  •  RunningLimit is set for MCProduction=1
  • Job stalled at input data resolution

DIRAC.TMU.jp

  • Health checker info. : "Short pilot jobs" has been found at 06:20:00 UTC on 2019/10/03.
  • Health checker info. : "Failed to install DIRAC on " has been found since 14:20:00 UTC on 2019/04/24.
  • Job submission check : Pilot submission failure has been found at 13:27:00 UTC on 2019/03/19.
  • Health checker info. : "Short pilot jobs" has been found since 10:20:00 UTC on 2018/11/02

DIRAC.Tokyo.jp

  • Decommissioned
  • Date, Issue, Tickets..

DIRAC.UAS.mx

  • Health checker info. : "Belle II software could not be installed on " has been found since 15:20:00 UTC on 2019/04/25
  • Job submission check : Pilot submission failure has been found since 00:21:00 UTC on 2019/04/04. (details)
  • Health checker info. : "Belle II software could not be installed on " has been found since 01:20:00 UTC on 2019/02/20.
  • Job submission check: 100% failed with errors from 22:00 2019/01/08 till 04:00 2019/01/09 (UTC)
  • Health checker info. : "Belle II software could not be installed on " has been found since 04:20:00 UTC on 2018/12/17. 
  • Health checker info. : "Belle II software could not be installed on " has been found since 16:20:00 UTC on 2018/11/14.
  • Job submission check : Pilot submission failure has been found since 01:26:00 UTC on 2018/09/21. (details)
  • Health checker info. : "Short pilot jobs" has been found since 17:20:00 UTC on 2018/09/17 Added to
  • Job submission check : Pilot submission failure has been found since 15:24:00 UTC on 2018/09/16 (emailed comp-dc-operations, create JIRA ticket when able)

DIRAC.UVic.ca

  • Date, Issue, Tickets...

DIRAC.UVic-local.ca

  • Health checker info. : "Short pilot jobs" has been found at 14:20:00 UTC on 2019/10/06.
  • User jobs failed on the site:
  • Job status check: "Input Data Resolution" issues (13% overall, 100% in past hours) on 2019/05/16 at 7:00 UTC.
  • Health checker info. : "Short pilot jobs" has been found since 04:20:00 UTC on 2019/05/16.(details)
  • Health checker info. : "Belle II software could not be installed on " has been found since 04:20:00 UTC on 2019/05/13.

DIRAC.Yamagata.jp

  • Health checker info. : "Short pilot jobs" has been found since 21:20:00 UTC on 2019/06/05.(details)

  • Health checker info. : "Short pilot jobs" has been found at 22:20:00 UTC on 2019/03/13.(details)

DIRAC.Yonsei.kr

  • Date, Issue, Tickets..

DIRAC.LocalTest.jp

  • Date, Issue, Tickets..

LCG.CESNET.cz

  • Health checker info. : "Failed pilot jobs" has been found at 06:20:00 UTC on 2019/05/15.(details)
  • Job submission check : Pilot submission failure has been found at 06:26:00 UTC on 2019/05/15. (details)
  • Health checker info. : "Failed pilot jobs" has been found since 20:20:00 UTC on 2019/05/13.(details)
  •   Need some intervention to run Merge jobs

LCG.COSENZA.IT

  • Downtime Downtime 2019-10-04 07:00(UTC) - 2019-10-11 18:00(UTC) 
  • Health checker info. : "Short pilot jobs" has been found at 22:20:00 UTC on 2019/09/24.

LCG.CNAF.it

  • Health checker info. : "Short pilot jobs" has been found since 11:20:00 UTC on 2019/10/09.
  • Health checker info. : "Short pilot jobs" has been found at 14:20:00 UTC on 2019/10/07.
  • Health checker info. : "Short pilot jobs" has been found since 00:20:00 UTC on 2019/10/07.
  • Health checker info. : "Short pilot jobs" has been found since 11:20:00 UTC on 2019/10/03.(details)
  • Health checker info. : "Short pilot jobs" has been found since 04:20:00 UTC on 2019/10/03.
  • Health checker info. : "Short pilot jobs" has been found at 13:20:00 UTC on 2019/10/02. 

LCG.CYFRONET.pl

  • Downtime:  2019-07-11 10:00  (UTC) -  2019-12-10 22:00 (UTC)
  • Health checker info. : "Short pilot jobs" has been found at 14:20:00 UTC on 2019/10/06.
  • Job submission check : Pilot submission failure has been found since 14:23:00 UTC on 2019/07/31.
  • Health checker info. : "Short pilot jobs" has been found since 13:20:00 UTC on 2018/12/13.

LCG.DESY.de

  • The site to be retired   – No more jobs to be submitted.

LCG.Frascati.it

  •  Site is currently Banned due to hardware problem since 2019-07-05

  • Job submission check : Pilot submission failure has been found since 14:24:00 UTC on 2019/05/24. 
    • GGUS 141688 ticket submitted.
  • Health checker info. : "BLAH ERROR" has been found since 15:20:00 UTC on 2019/05/21.(details)

LCG.HEPHY.at

  • Health checker info. : "Failed pilot jobs" has been found at 13:20:00 UTC on 2019/10/03.(details)
  • Health checker info. : "Failed pilot jobs" has been found at 15:20:00 UTC on 2019/05/22.(details)
  • Health checker info. : "Short pilot jobs" has been found at 15:20:00 UTC on 2019/04/12.
  • Health checker info. : "Failed pilot jobs" has been found at 02:20:00 UTC on 2019/01/30.(details) and at 02:20:00 UTC on 2019/01/31.(details)
  • submission check : Pilot submission failure has been found at 14:22:00 UTC on 2018/12/27.

LCG.IPHC.fr

  • Health checker info. : "Failed pilot jobs" has been found at 00:20:00 UTC on 2018/06/18.(details)

LCG.KEK.jp

  • Health checker info. : "Short pilot jobs" has been found since 05:20:00 UTC on 2019/10/09
  • Health checker info. : "Short pilot jobs" has been found at 14:20:00 UTC on 2019/10/06.
  • SiteDirector "Failed to check the availability" 

LCG.KEK2.jp

  • Health checker info. : "Short pilot jobs" has been found at 16:20:00 UTC on 2019/10/09.
  • Health checker info. : "Short pilot jobs" has been found at 14:20:00 UTC on 2019/10/06.
  • Health checker info. : "Short pilot jobs" has been found at 04:20:00 UTC on 2019/09/29.
  • Health checker info. : "Short pilot jobs" has been found at 01:20:00 UTC on 2019/08/28.
  • Still all jobs failing with InputDataResolution on 2019/07/25.
  • GGUS ticket : "KEK SE: PrepareToGet ETIMEDOUT for a specific file path"(140328) has been submited at 21:26:29 UTC on 2019/03/21.
  • Health checker info. : "Short pilot jobs" has been found at 11:20:00 UTC on 2019/03/22.(details)
  • Health checker info. : "Failed pilot jobs" has been found since 13:20:00 UTC on 2018/12/21.
  • all jobs are in "Input data resolution" status since 12.00 2018/12/18 UTC

LCG.KEK-merge.jp

  • Health checker info. : "Short pilot jobs" has been found at 00:20:00 UTC on 2019/08/26.
  •   Most jobs failing with InputDataResolution
  • "Belle II software could not be installed on cb268.cc.kek.jp" has been found since 14:20:00 UTC on 2019/04/05
  • Health checker info. : "Short pilot jobs" has been found since 20:20:00 UTC on 2019/04/02
  •   being commissioned...

LCG.KISTI.kr

  • Jobs slots are disabled for SE maintenance from 2018-10-19 to 2018-10-23
  • Health checker info. : "BLAH ERROR" has been found since 06:20:00 UTC on 2018/10/19.(details)

  • "Short pilot jobs" has been found at 06:20:00 UTC on 2018/10/09.(details)
  • BLAH error seems to be happen if jobs exceed the allocated # of queues, not a problem (Site specific feature)  
  • A large number of Merge jobs in waiting status

LCG.KMI.jp

  • Health checker info. : "Short pilot jobs" has been found since 08:20:00 UTC on 2019/10/09.
  • Health checker info. : "Short pilot jobs" has been found at 22:20:00 UTC on 2019/10/02. 
  • Health checker info. : "Short pilot jobs" has been found at at 22:20:00 UTC on 2019/04/08 and at 15:20:00 UTC on 2019/04/12.
  • Job status plots, 100% "Application Finished With Errors", 10:00:00 on 2019/04/08
  • Job submission check : Pilot submission failure has been found since 21:25:00 UTC on 2019/02/01. 

  • Job status check: Application finished with errors (7% of the jobs in last 24 hours) on 2018/12/21 at 8:48 UTC.
  • Health checker info. : "Short pilot jobs" has been found at 14:20:00 UTC on 2018/12/13.(details)
  • Health checker info. : "Belle II software could not be installed on pwn22.local" has been found since 21:20:00 UTC on 2018/11/22.
  • Job submission check : Pilot submission failure has been found since 21:24:00 UTC on 2018/10/02. (details)

LCG.LAL.fr

  • Downtime: LCG.LAL.fr: Downtime 2019-09-27 16:00 UTC - 2019-09-28 16:00 (UTC)
  • Health checker info. : "Short pilot jobs" has been found at 06:20:00 UTC on 2019/05/01.(details)

LCG.Legnaro.it

  • Date, Issue, Tickets...
  • Downtime Start time: 2019-10-15 06:30 (UTC) - End time: 2019-10-15 17:00 (UTC)

LCG.Napoli.it

  • Health checker info. : "Failed pilot jobs" has been found at 14:20:00 UTC on 2019/10/06.
  • Downtime 2019-10-04 07:00(UTC) - 2019-10-11 18:00(UTC) 
  • Job submission check : Pilot submission failure has been found since 12:27:00 UTC on 2019/10/02.
  •  t2-recas-ce01.na.infn.it shows pilot submission error and this CE should  be banned till 2019 September.
  • Stalled jobs

LCG.NTU.tw

  • Job submission check : Pilot submission failure has been found since 04:21:00 UTC on 2019/10/09.

  • Job submission check : Pilot submission failure has been found since 06:21:00 UTC on 2019/10/07.
    Solved and verified : GGUS ticket  https://ggus.eu/?mode=ticket_info&ticket_id=143574 has submitted

  • Downtime 2019-10-05 16:00(UTC) - 2019-10-07 04:00(UTC) 
  • "Pilot Submission Failure" has been observed since 2019-10-02 05:27 UTC
  • Job submission check : Pilot submission failure has been found since 09:22:00 UTC on 2019/09/14. (details)
  • "BLAH ERROR" has been found since 07:20:00 UTC on 2019/09/13
  • "Short pilot jobs" has been found since 07:20:00 UTC on 2019/09/13.(details)
  • Job submission check : Pilot submission failure has been found since 13:22:00 UTC on 2019/09/01. (details)
  • "Short pilot jobs" has been found at 00:20:00 UTC on 2019/07/08. (screenshot)
  • Job submission check : Pilot submission failure has been found at 14:23:00 UTC on 2019/06/28.
  • Health checker info. : "Short pilot jobs" has been found at 06:20:00 UTC on 2019/04/18.

LCG.Pisa.it

  • Job submission check : Pilot submission failure has been found since 13:22:00 UTC on 2019/10/09.
  • "Failed pilot jobs" has been found since 20:20:00 UTC on 2019/10/06.
  • "Failed Pilot" has been observed since 2019-10-01 18:27 UTC
  • "Failed pilot jobs" has been found since 17:20:00 UTC on 2019/10/01.
  • "Failed Pilot" has been observed since 2019-09-30 09:27 UTC (for 13 hours)
  •  GGUS ticket : "INFN-PISA: gridce0.pi.infn.it - Job submission failure"(142999) has been submitted at 20:54:58 UTC on 2019/09/04.
  • Job submission check : Pilot submission failure has been found since 10:25:00 UTC on 2019/04/12
  •  GGUS ticket : "INFN-PISA: All CEs - LSF directory doesn't exists"(139815) has been submitted at 00:03:38 UTC on 2019/02/21. Link to GGUS ticket.

  • "Failed to install DIRAC on so1wn8.pi.infn.it,n2wn13.pi.infn.it,n2wn18.pi.infn.it,so1wn6" has been found since 01:20:00 UTC on 2019/01/02.

  • "Short pilot jobs" has been found since 02:20:00 UTC on 2018/09/21.(details)

LCG.Roma3.it

  • Health checker info. : "BLAH ERROR" has been found since 02:20:00 UTC on 2019/08/02.
    • The site admin informs the queue name has changed to cream-pbs-lcg6. Asking to DIRAC admin to change the queue in configuration.
  • Health checker info. : "Failed pilot jobs" has been found at 14:20:00 UTC on 2019/04/20.

LCG.TAU.il

  • Health checker info. : "Short pilot jobs" has been found since 14:20:00 UTC on 2019/10/09.
  • Health checker info. : "Short pilot jobs" has been found at 14:20:00 UTC on 2019/10/06.
  • Health checker info. : "Failed pilot jobs" has been found at 23:20:00 UTC on 2019/05/29.(details)
  • Health checker info. : "Failed pilot jobs" has been found since 19:20:00 UTC on 2019/05/24.(details)

LCG.Torino.it

  • Health checker info. : "BLAH ERROR" has been found since 10:20:00 UTC on 2019/10/02. 
    GGUS ticket https://ggus.eu/?mode=ticket_info&ticket_id=143486 has submitted
  • Health checker info. : "CRL has expired" has been found since 19:20:00 UTC on 2019/09/27. More than 4 hours

  • Health checker info. : "CRL has expired" has been found since 20:20:00 UTC on 2019/09/24.
  • Job submission check : Pilot submission failure has been found at 14:22:00 UTC on 2019/09/24. 
  • Health checker info. : "CRL has expired" has been found since 06:20:00 UTC on 2019/09/22.
  • Job submission check : Pilot submission failure has been found at 13:26:00 UTC on 2019/05/20. (details)
  • Job submission check : Pilot submission failure has been found at 14:25:00 UTC on 2019/05/11. (details)
  • Job submission check : Pilot submission failure has been found at 14:25:00 UTC on 2019/05/09
  • Job submission check : Pilot submission failure has been found at 06:25:00 UTC on 2019/03/23.

LCG.ULAKBIM.tr

  • The queue 'belle7' to be disabled. use only 'belle'
  • Health checker info. : "Aborted pilot jobs" has been found since 01:20:00 UTC on 2019/08/01.

OSG.BNL.us

  • Health checker info. : "Short pilot jobs" has been found since 11:20:00 UTC on 2019/10/09. 
  • Health checker info. : "Short pilot jobs" has been found at 14:20:00 UTC on 2019/10/06.
  • Health checker info. : "Short pilot jobs" has been found since 21:20:00 UTC on 2019/10/02.(details)  
  • Health checker info. : "Short pilot jobs" has been found since 13:20:00 UTC on 2019/08/01.
  • Health checker info. : "Belle II software could not be installed on " has been found since 19:20:00 UTC on 2019/02/14.
  • Job submission check: Jobs fail with errors or input data resolution the last 24h (6:00 UTC, 2019/01/09) 
  • Production jobs: UNAVAILABLE files
  • Number of concurrent MCProduction jobs restricted
  •  MCProduction jobs are mostly stalled

OSG.CORI.us

  • OSG.CORI.us resource has been removed because CY18 allocation was not approved

OSG.UMiss.us

  • Health checker info. : "Short pilot jobs" has been found since 21:20:00 UTC on 2019/07/10. (screenshot)
  • Health checker info. : "Short pilot jobs" has been found since 21:20:00 UTC on 2019/07/08. (screenshot)
  • Health checker info. : "Short pilot jobs" has been found at 06:20:00 UTC on 2019/07/03.
  • Health checker info. : "Short pilot jobs" has been found since 09:20:00 UTC on 2019/06/27
  • Health checker info. : "Short pilot jobs" has been found since 17:20:00 UTC on 2019/06/04.(details)
  • Health checker info. : "Short pilot jobs" has been found at 23:20:00 UTC on 2019/06/03.(details)
  • Health checker info. : "Aborted pilot jobs" has been found at 22:20:00 UTC on 2019/06/02.
  • Health checker info. : "Short pilot jobs" has been found at 14:20:00 UTC on 2019/05/20.(details)
  • Job status check: 100% of issues of Input Data Resolution on 2019/05/14 at 7:00 UTC.
  • Health checker info. : "Short pilot jobs" has been found at 07:20:00 UTC on 2019/05/14.(details)
  • Health checker info. : "Short pilot jobs" has been found since 22:20:00 UTC on 2019/05/12.(details)
    Updated
  • Job submission check : Pilot submission failure has been found since 12:27:00 UTC on 2019/05/04. (details)
  • Health checker info. : "Short pilot jobs" has been found since 07:20:00 UTC on 2019/05/11.(details)
  • Health checker info. : "Short pilot jobs" has been found since 20:20:00 UTC on 2019/05/01.(details) 
    Health checker info. : "Short pilot jobs" has been found since 21:20:00 UTC on 2019/04/30.(details)
    Health checker info. : "Short pilot jobs" has been found since 04:20:00 UTC on 2019/04/29.(details)
    Health checker info. : "Short pilot jobs" has been found since 20:20:00 UTC on 2019/04/22.
    Health checker info. : "Short pilot jobs" has been found at 14:20:00 UTC on 2019/04/16.
  • Health checker info. : "Short pilot jobs" has been found since 14:20:00 UTC on 2019/04/11 and  at 17:20:00 UTC on 2019/04/14.
  • Job status check: 34.7% appl. finshed with errors on 2019/04/08.

SSH.KMI.jp

  • Job status check: Application finished with errors (12% of the jobs in last 24 hours) on 2018/12/22 at 11:30 UTC.
  • Health checker info. : "Short pilot jobs" has been found at 20:20:00 UTC on 2018/08/13.

Test.KIT.de

  • Test site for the opportunistic resources at KIT. No need to report problems.LCG.Pisa.it

Test.ULAKBIM.tr

  • Test site for the SL7 resources at ULAKBIM. No need to report problems.
  • No activities expected currently.

VCYCLE.Napoli.it

  • Opportunistic site (Empty plot is not a problem)
  •  Ban lifted
  • "Sudo CE Error: sudo execution fails with return code 1"

VCYCLE.HNSC01.it, VCYCLE.HNSC02.it

  • Opportunistic site (Empty plot is not a problem)


Links