Contents


Contents

l




Production Plans

Data Production Status

  • Raw data processing
    • Bucket 15: completed
    • BGOverlay productions: completed
    • Bucket 9-14 (prompt processing of 2020a/b): completed
  • MC13 production

    • MC13a (Run-independent MC) production: Y(nS) productions on going

    • MC13b (Run-dependent MC) production: ongoing
  • Skim
    • SkimP11x1 (Proc11 skims): ongoing
    • SkimB11x1 (Bucket11 skims): ongoing
  • The MC14 campaign will begin soon with release-05 (early-mid November)

Production Status

Full resource usage but the number of running jobs sometimes decreases

Data production summary page : Data Production Status

Data (re)processing: No raw data processing jobs expected now.

  • Bucket 15 is completed

  • BGOverlay completed

MC production:

  • An additional production for 1/ab of nominal phase 3 generic MC13a is ongoing.

Analysis skimming:

  • Skims of the 2020a-b dataset are currently in progress and progressing.


Central Services

Dirac (dirac.cc.kek.jp, b2dchsv01-b2dchsv06.cc.kek.jp, b2dchsv08.cc.kek.jp)

  • Data, Issue, Tickets..

DB Production (b2dchdb1.cc.kek.jp, b2dchdb2.cc.kek.jp, b2dcsdb1.cc.kek.jp, b2dcsdb2.cc.kek.jp)

  • b2dcdb05-07.cc.kek.jp are under construction and the status can be ignored (07 Nov 2020)
  • 2020/11/07: "DB Production" servers b2dcdb06.cc.kek.jp, b2dcdb05.cc.kek.jp: blue band is rapidly increased by more than 5 GB
  • 2020/11/07: "Web" servers b2dcwvm01.cc.kek.jp: grey part has rapidly increased and gone over twice higher than the red line.

"Web" servers

  •   b2dcwvm01.cc.kek.jp load has increased a lot. Staying around red line.

DDM (bldirac01.sdcc.bnl.gov)

  • DDM is stuck
  •  DDM is not responding to API calls
  • 2018-03-01 DDM deletion task seems stuck

Conditions DB ()

Monitor

  • lfc-ls segmentation fault in SiteCrawler on EL7 sites

AMGA

  • Date, Issue, Tickets...

LFC

  • Date, Issue, Tickets...

CVMFS

  • problem in replication from cvmfs-stratum-zero.cc.kek.jp/cvmfs/belle.kek.jp

File Transfers and Replication Status

  • 2020-04-03 00:15 UTC File Transfer failures: Pisa-DATA-SE  

FTS

Any problem in the FTS service or FTS monitoring are to be recorded here. Site/SE specific issues are to be recorded under each SIte/SE

Replication Status

  • Date, Issue, Tickets...

Job Status Plot

  • Many sites have similar red peak:
  • Most of the sites have similar red peak
  • small amount of jobs which "finished with errors" at many sites simultaneously 2020-11-04
  • All plots have a lot of "finished with errors" red histograms. 
  • No job status plots for 38 sites while MC13 production is ongoing 2020-10-17
  • No job status plots for 13 sites while MC13a production is ongoing 2020-05-18
  • No job status plots for 15 sites while MC13a production is ongoing 2020-02-07

Job Summary

  • Date, Issue, Tickets...



SEs

SE Common Issues

  • Issues with individual SEs should be recorded below (Primary SEs or Other SEs)

Raw data SEs: 

Raw data SE: KEK-TMP-SE (srm://kek2-se02.cc.kek.jp:8444/srm/managerv2?SFN=/belle/TMP)

Raw data SE: KEK-RAW-SE (srm://kek2-se02.cc.kek.jp:8444/srm/managerv2?SFN=/belle/RAW)

  • Many replication failed from KEK-RAW-SE to BNL-TAPE-SE at 11:30, 25, May 2020 

Raw data SE: KEK-TAPE-SE (srm://kek2-se02.cc.kek.jp)

  • 2020-10-13: Slow FTS transfers - 

Raw data SE: BNL-TAPE-SE (srm://dcblsrm.sdcc.bnl.gov:8443/srm/managerv2?SFN=/pnfs/sdcc.bnl.gov/tape


  •  

Primary SEs

Primary SE: BNL-TMP-SE (dcblsrm.sdcc.bnl.gov)

  • SE Health Check by DDM: Failure on remove file, remove directory have been observed since 2020-12-01 05:11:38 (3 hours)
  • BNL is scheduled to have the network outage on 2020 Dec 1st 06:00 - 18:00 EST

  • No Replication Trend Plot for BNL-TMP-SE 2020-01-02 09:30 UTC 

Primary SE: CESNET-TMP-SE (dpm1.egee.cesnet.cz) 

  • Date, Issue, Tickets...

Primary SE: CNAF-TMP-SE (storm-fe-archive.cr.cnaf.infn.it)CNAF-TMP-SE

  • 2019/01/27 File transfer failures from CNAF-TMP-SE to NTUCC-DATA-SE.

Primary SE: DESY-TMP-SE (dcache-se-desy.desy.de)

Primary SE: KEK-DISK-TMP-SE (srm://kek2-se03.cc.kek.jp:8444/srm/managerv2?SFN=/disk/belle/TMP)

  •  KEK-DISK-TMP-SE - files unavailable
  • Replication failures have been observed since 2020-09-10 10:00
  • SE Health Check by DDM: Failure on remove file, remove directory have been observed since 2020-08-28 07:01:17 (119 hours)

Primary SE: KIT-TMP-SE (dcachesrm-kit.gridka.de)

  • Date, Issue, Tickets...

Primary SE: KMI-TMP-SE (nsrmfe01.hepl.phys.nagoya-u.ac.jp )

  • SE Health Check by DDM: Failure on ls, upload have been observed since 2020-11-14 07:07:37 (6 hours)

  • SE Health Check by DDM: Failure on ls, upload have been observed since 2020-09-02 02:55:52 (3 hours)
  • 2020-01-02 09:15 UTC File transfer failure from KMI-TMP-SE to LAL-DATA-SE

Primary SE: Napoli-TMP-SE (belle-dpm-01.na.infn.it )

  •   Failed transfers from Napoli-TMP-SE:
    Downtime related to Cooling system failure
  • SE Health Check by DDM: Failure on upload have been observed since 2020-09-30 04:53:07 (4 hours)
  • SE Health Check by DDM: Failure on upload have been observed since 2020-09-02 02:25:58 (4 hours)

Primary SE: SIGNET-TMP-SE (dcache.ijs.si )

  • Data transfer errors 

Other SEs

Adelaide-TMP-SE (coepp-dpm-01.ersa.edu.au)

  •  Adelaide SE is banned

CYFRONET-TMP-SE (dpm.cyf-kr.edu.pl)

  • Date, Issue, Tickets...
  • CYFRONET is banned for write 
  • CYFRONET SE to be replaced

CINVESTAV-TMP-SE (jaguar-se.fis.cinvestav.mx)

Frascati-TMP-SE (atlasse.lnf.infn.it)

  • Date, Issue, Tickets...

HEPHY-TMP-SE (hephyse.oeaw.ac.at)

  • Date, Issue, Tickets... 

  •   The new EOS SE put in production
  •   HEPHY-TMP-SE banned

IPHC-TMP-SE (sbgse1.in2p3.fr)

  • Date, Issue, Tickets...

KEK2-TMP-SE (srm://kek2-se03.cc.kek.jp:8444/srm/managerv2?SFN=/belle/TMP)

  • Date, Issue, Tickets...
  • banned for read and write

KISTI-TMP-SE (belle-se-head.sdfarm.kr)

  • Transfer Destination and Replication Problems
  • No new assignment of MC production data blocks to this destination

LAL-TMP-SE (grid05.lal.in2p3.fr)

  • Date, Issue, Tickets...

Melbourne-TMP-SE (b2se.mel.coepp.org.au)

McGill-TMP-SE  (storm02.clumeq.mcgill.ca)

  • McGill-TMP-SE will be decomissioned in early 2018.

MPPMU-TMP-SE (grid-srm.rzg.mpg.de)

  • Date, Issue, Tickets...

NTU-TMP-SE (bgrid3.phys.ntu.edu.tw)

  •  NTU-TMP-SE banned for write 

NTU-CC-DATA-SE

  • 2020/11/08 File transfer failure to NTU-CC-DATA-SE 

NTU-CC-TMP-SE (belle2grid3.cc.ntu.edu.tw)

  • Low efficiency for source
  • 2019/8/23 file transfer failure to NTU-CC-DATA-SE
  • 2019/01/27 File transfer failures from CNAF-TMP-SE to NTUCC-DATA-SE. 
  • File transfer failure and cancellation to NTUCC-DATA-SE happened 2018-12-22
  • NTUCC-TMP-SE banned for write 

Pisa-TMP-SE (stormfe1.pi.infn.it)

  • Date, Issue, Tickets...

PNNL-TMP-SE (se.hep.pnnl.gov) 

  • Being decommissioned. No need to report any issues. 

Roma3-TMP-SE (storm-01.roma3.infn.it)

  •  Date, Issue, Tickets...

TAU-TMP-SE (tau-se.hep.tau.ac.il)

  • Date, Issue, Tickets...

Torino-TMP-SE (se-srm-00.to.infn.it)

  • Date, Issue, Tickets...

ULAKBIM-TMP-SE (torik1.ulakbim.gov.tr)

  • File transfer failures destination 

UMiss-TMP-SE (umiss005.hep.olemiss.edu)

  • Date, Issue, Tickets...

UVic-TMP-SE(charon01.westgrid.ca)

  • Date, Issue, Tickets...

Sites

Sites Common Issue

  • Date, issue for sites wide

ARC.DESY.de

  •    Downtime: CLOUD.DESY.de and ARC.DESY.de 2020-11-24 UTC 07:00 to 2020-11-26 18:00
  • MCProduction restricted

ARC.DESY-test.de

  • A test queue for the new CE.

ARC.KIT.de

  • Downtime from 2020-09-22 08:00 (UTC) to 2020-10-29 22:00 (UTC)
  • Downtime from 2020-09-02 06:00 to 2020-09-23 14:00 (UTC)
  • Downtime 2020-07-03 14:00 - 2020-07-07 14:00 
  •   MCProduction restricted
  • Downtime 2020-06-16 07:00 - 2020-06-16 10:00

  • "Short Pilot" has been observed since 2020-06-02 22:47 UTC .

ARC.KIT-TARDIS.de

  • Downtime from 2020-09-22 16:00 (UTC) upto 2020-10-31 20:00 (UTC)
  • Downtime from 2020-09-02 06:00 to 2020-09-23 14:00 (UTC)
  • renamed from Test.KIT.de

ARC.LMU.de

  • This is a test site. Do not need to report any issue.

ARC.LMU2.de

  • Banned as currently no resource behind the CE

ARC.Melbourne.au

  • "Failed Payload Job" has been observed since 2020-11-24 00:48 UTC (for 22 hours) and since   
  • "Failed Payload Job" has been observed since

  • "Pilot Submission Failure" has been observed since 2020-09-02 08:30 UTC (for 22 hours)Jira ticket submitted:

ARC.MPPMU.de

  • "Pilot Submission Failure" has been observed since 2020-11-09 01:40 UTC (for 5 hours)
  • Downtime 2020-11-04 19:00 (UTC) - 2020-11-06 14:30 (UTC)
  • Downtime 2020-07-28 00:00 - 2020-07-29 00:00

  • Downtime 2020-06-16 11:00 - 2020-06-16 17:00

ARC.SIGNET.si

  • Downtime 2020-11-26 UTC 14:20 to 2020-11-26 20:00
  • "Failed Payload Job" has been observed since 2020-11-23 03:48 UTC (for 11hours)
  • "Short Pilot" has been observed since 2020-10-14 22:31 UTC (for 8 hours)
  • "Aborted Pilot" has been observed since 2020-08-14 14:35 UTC (for 10 hours)
  • "Failed to install DIRAC on " has been found since 20:20:00 UTC on 2018/11/03.

CLOUD.CC1_Krakow.pl

  • Not used in production yet. Seeing no jobs (no plot) is not a problem

CLOUD.DESY.de

  • Downtime: CLOUD.DESY.de and ARC.DESY.de 2020-11-24 UTC 07:00 to 2020-11-26 18:00 

DIRAC.Beihang.cn

  • Site is banned.
  • "Failed Payload Job" has been observed since 2019-04-19 11:15 UTC  
  • Health checker info. : "Short pilot jobs" has been found at 06:20:00 UTC on 2019/04/18.
  • Health checker info. : "Short pilot jobs" has been found at 06:20:00 UTC on 2018/12/08.
  • Job submission check : Pilot submission failure has been found since 09:24:00 UTC on 2018/09/21. (details)
  • The number of jobs limited.
  • All the upload trials are failing against all the SEs configured: OutputSE (KMI-TMP-SE, PNNL-TMP-SE), Fail-over SEs(DESY-TMP-SE, Napoli-TMP-SE, PNNL-TMP-SE, KIT-TMP-SE) 
  • Large % of failed jobs in DIRAC status plot (Added 2016-11-03 22:45:00 UTC) 

DIRAC.BINP.ru

  • "Failed Payload Job" has been observed since 2020-10-27 08:40 UTC (for 6 hours) 
  • "Short Pilot" has been observed since 2020-09-02 15:30 UTC (for 15 hours)

DIRAC.BINP-VM.ru

  • "Failed Payload Job" has been observed since 2020-10-27 14:40 UTC (for 1 hours).

  • Job status plots, "Application Finished with errors" 2020-04-21 10:00 to 2020-04-22 10:00 UTC

DIRAC.CINVESTAV.mx

  • Date, Issue, Tickets...

DIRAC.DESY.de

  • Test site. Not in use in MC production

DIRAC.IITG.in

  • "Aborted Pilot" has been observed since 2020-10-24 01:38 UTC (for 53 hours). Also since 2020-10-08 12:31 UTC (for 104 hours) and since 2020-03-19 13:24 UTC (for 1 hours) 
  • "Pilot Submission Failure" has been observed since 2020-09-01 17:30 UTC
  • "Failed Payload Job" has been observed since 2020-10-06 08:31 UTC (for 46 hours)
  • Health checker info. : "Short pilot jobs" has been found at 06:20:00 UTC on 2019/05/16.
  • Job status plots, 100% "Application Finished With Errors", 10:00:00 UTC on 2019/04/08. Still unchanged as of 2019/04/26. 

DIRAC.IITH.in

  • "Pilot Submission Failure" has been observed since 2020-02-17 15:59 UTC (for 7 hours)

  • Health checker info. : "Short pilot jobs" has been found at 07:20:00 UTC on 2019/03/29.(details

DIRAC.LMU.de

  • Not in use in MC production
  • Banned for now.

DIRAC.MIPT.ru

  • Health checker info. : "Aborted pilot jobs" has been found since 13:20:00 UTC on 2019/04/20.

DIRAC.Nagoya.jp

  • "Short Pilot" has been observed since 2020-11-24 01:48 UTC (for 21 hours). Also since 2020-10-28 13:40 UTC (for 1 hours).
  • "Failed Payload Job" has been observed since 2020-10-05 11:31 UTC (for 3 hours)

DIRAC.Nara-WU.jp

  • "Pilot Submission Failure" has been observed since 2020-06-12 09:47 UTC (for 5 hours)
  • Under commissioning from 2018-11-13

DIRAC.NDU.jp

  • Date, Issue, Tickets...

DIRAC.Niigata.jp

  • "Failed Payload Job" has been observed since 2020-10-27 06:40 UTC (for 8 hours).

DIRAC.Niigata2.jp

  • "Failed Payload Job" has been observed since 2020-10-27 10:40 UTC (for 4 hours).
  • "Pilot Submission Failure" has been observed since 2020-10-04 11:31 UTC (for 19 hours). Also since 2020-10-04 11:31 UTC (for 3 hours)
  • "Application Finished with Errors" with 38.7% from 2020-04-21 10:00 UTC to 2020-04-22 02:00 UTC 

DIRAC.Osaka-CU.jp

  • Site is banned
  • Job submission check : Pilot submission failure has been found since 07:23:00 UTC on 2018/12/04. (details)
  • Health checker info. : "Short pilot jobs" has been found since 22:20:00 UTC on 2018/03/17.
    → Ask site admin to check the status 2018-03-17 10:00 JST. (DB access failure again from DIRAC.Osaka-CU.jp to PNNL from 2018-03-16 11:00 UTC)

DIRAC.PAU.in

  • Date, Issue, Tickets...

DIRAC.PNNL.us

  • Site to be decommissioned

DIRAC.PNNL2.us

  • Site to be decommissioned

DIRAC.PNNL-CASCADE.us

  • Seeing no jobs (no plot) is not a problem

DIRAC.PNNL-PIC.us

  • Seeing no jobs (no plot) is not a problem

DIRAC.RCNP.jp

  • "Short Pilot" has been observed since 2020-11-09 12:40 UTC (for 42 hours). Also since 2020-11-21 08:48 UTC (22hrs). 
  • "Not Enough Disk Space" for "fpb22,fpb23" has been observed since 2020-08-28 23:19:23 UTC (for 19 hours)
  • High failure jobs since 2020-08-24 (after migration)

DIRAC.LocalTest.jp

  • Downtime 2020-11-30 00:00 - 2020-12-07 00:00 (UTC)
  • Downtime from 2020-08-28 06:00 to 2020-09-01 04:00 (UTC)
    Downtime 2020-08-17 06:00 to 2020-08-31 06:00 (UTC)  

DIRAC.Shandong.cn

  • "Aborted Pilot" has been observed since 2020-10-05 13:31 UTC (for 1 hours)
  • "Aborted Pilot" has been observed since 2020-10-04 13:31 UTC (for 1 hours)

DIRAC.SSU.kr

  • "Short Pilot" has been observed since 2020-10-17 02:31 UTC 
  • "Pilot Submission Failure" has been observed since 2020-08-14 08:35 UTC (for 62 hours)
  • DIRAC.SSU.kr "Failed Payload Job" has been observed since 2020-06-08 02:47 UTC. 

DIRAC.TIFR.in

  • "Pilot Submission Failure" has been observed since 2020-11-30 18:48 UTC (for 12 hours)

  • "Pilot Submission Failure" has been observed since 2020-11-14 03:41 UTC (for 10 hours)
  • "Pilot Submission Failure" has been observed since 2020-10-19 21:31 UTC (for 217 hours)
  • There is a hardware failure at this site - hardware replacement has been delayed by COVID. 
  • TIFR site is down due to hardware failure 
  • Health checker info. : "Short pilot jobs" has been found at 14:20:00 UTC on 2019/10/06.
  • 2018/07/06. (details)
  • Health checker info. : "Short pilot jobs" -- Already reported: 
  •  RunningLimit is set for MCProduction=1

DIRAC.TMU.jp

  • "Failed Payload Job" has been observed since 2020-08-28 01:18 UTC (for 21 hours)
  • High failure jobs since 2020-08-24 (after migration)
  • Health checker info. : "Short pilot jobs" has been found since 10:20:00 UTC on 2018/11/02

DIRAC.Tokyo.jp

  • Decommissionedf
  • Date, Issue, Tickets..

DIRAC.UAS.mx

  • Job submission check : Pilot submission failure has been found since 00:21:00 UTC on 2019/04/04.
  • Health checker info. : "Belle II software could not be installed on " has been found since 04:20:00 UTC on 2018/12/17. 

DIRAC.UVic.ca

  • DIRAC.UVic.ca, DIRAC.UVic-local.ca, CLOUD.DESY.de: downtime 2020-10-24 14:00 (UTC) - 2020-10-24 17:00 (UTC)

  • DIRAC.UVic.ca, DIRAC.UVic-local.ca, CLOUD.DESY.de: 

    Downtime start time 2020-09-17 22:00 - end time 2020-09-22 00:00 (UTC)

  • User jobs enabled

DIRAC.UVic-local.ca

  • User jobs failed on the site:

DIRAC.Yamagata.jp

  • "Short Pilot" has been observed since 2020-10-27 06:40 UTC (for 8 hours), since 2020-08-15 17:35 UTC (for 6 hours) and since 2020-06-20 23:47 UTC (for 6 hours)

  • "Failed Payload Job" has been observed since 2020-10-27 09:40 UTC (for 5 hours), since 2020-10-25 14:38 UTC (for 1 hours) and since 2020-06-28 12:07 UTC  .

  • "Pilot Submission Failure" has been observed since 2020-09-21 08:24 UTC (for 22 hours) and since 2020-09-09 03:24 UTC (for 4 hours)
  • "Short Pilot" has been observed since 2020-08-16 14:35 UTC (for 8 hours)and since 22:20:00 UTC on 2019/03/13

DIRAC.Yonsei.kr

  • "Failed Payload Job" has been observed since 2020-10-27 08:40 UTC (for 2 hours)
  • "Short Pilot" has been observed since 2020-10-15 03:31 UTC (for 3 hours), since 2020-08-10 21:35 UTC (for 1 hours)and since 2020-07-26 14:15 UTC (for 9 hours)  

LCG.CESNET.cz

  • Downtime 2020-12-01 07:00 - 2020-12-01 14:00 (UTC)
  • Downtime 2020-11-30 07:00 - 2020-12-31 23:00 (UTC)
  • Downtime 2020-11-05 00:00 (UTC) - 2020-12-03 00:00 (UTC)
  • "Failed Payload Job" has been observed since 2020-10-27 08:40 UTC (for 6 hours)

  • CESNET-DATA-SE :Transfer Destination Problems from several sources:
  • "Failed Pilot" has been observed since 2020-09-22 22:24 UTC (for 8 hours) (details).
  • Failed Pilot" has been observed since 2020-09-09 03:24 UTC (for 3 hours)

  • New CE to be configured:
  •   Need some intervention to run Merge jobs

LCG.COSENZA.IT

  • "Aborted Pilot" has been observed since 2020-11-30 19:48 UTC (for 3 hours)
  • "Failed Payload Job" has been observed since 2020-10-27 13:40 UTC (for 1 hours)
  • "Failed Payload Job" has been observed since 2020-09-24 02:25 UTC (for 4 hours)
  • "Short Pilot" has been observed since 2020-09-24 05:25 UTC (for 1 hours)

LCG.CNAF.it

  • "Aborted Pilot" has been observed since 2020-11-25 04:48 UTC (for 10 hours) . 
  •   CREAM CEs to be decommissioned
  • Downtime 2020-11-03 00:00 (UTC) - 2020-11-05 12:00 (UTC)

  • "Short Pilot" has been observed since 2020-10-28 13:40 UTC (for 1 hours). Also since 2020-09-23 13:25 UTC (for 1 hours) and since  2020-05-20 02:43 UTC (for 5 hours)
  • Downtime 2020-10-27 06:00 - 2020-10-27 12:00 (UTC)
  •   MCProduction jobs restricted

LCG.CYFRONET.pl

  • Downtime 2020-10-06 10:00 - 2020-10-31 00:00 (UTC)
  • Health checker info. : "Short pilot jobs" has been found since 13:20:00 UTC on 2018/12/13.

LCG.DESY.de

  • The site to be retired   – No more jobs to be submitted.

LCG.Frascati.it

  • "Pilot Submission Failure" has been observed since 2020-09-09 08:24 UTC (for 22 hours)

  • "Aborted Pilot" has been observed since 2020-08-15 04:35 UTC (for 1 hours) 
  • "BLAH Error" has been observed since 2020-08-10 22:34:18 UTC (for 5 hours). Also since 2020-07-25 23:07:32 UTC (for 43 hours), since 2020-07-23 23:07:30 UTC (for 18 hours) and since   2020-07-18 16:07:24 UTC (for 24 hours)
  •  Site is currently Banned due to hardware problem since 2019-07-05

LCG.HEPHY.at

  • Failed Payload job has been observed since 2020-11-24 UTC 03:48 UTC (for 11hrs)
    GGUS ticket https://ggus.eu/index.php?mode=ticket_info&ticket_id=149682 has been submitted
  • "Pilot Submission Failure" has been observed since 2020-09-02 04:30 UTC (for 26 hours)
  • HEPHY - Migration to the new site

  • SIGNET-TMP-SE is used as its OutputSE:

LCG.IHEP.cn

  • Downtime 2020-12-01 00:00 - 2020-12-02 15:00 (UTC)
  • "Failed Payload Job" has been observed since 2020-11-14 07:41 UTC (for 15 hours) 
  • "Short Pilot" has been observed since 2020-11-01 11:40 UTC (for 5 hours). Also since 2020-11-11 16:40 UTC (for 10 hours), since 2020-11-14 08:41 UTC (for 5 hours) and sincesince 202-11-23 08:48 UTC (for 6 hours).  

LCG.IN2P3CC.fr

  • being commissioined 2020-11-13 9:00 UTC

LCG.IPHC.fr

  • "Failed Payload Job" has been observed since 2020-09-02 21:30 UTC (for 2 hours)
  • Date, Issue, tickets...

LCG.KEK.jp

  • Downtime 2020-11-30 00:00 - 2020-12-07 00:00 (UTC)
  • banned until the runtime directory is fixed
    • There will be no activities until the ban is lifted
  •   RawJobStatus: LCG.KEK.jp with no activity for some hours and OSG.BNL.us is empty. 
  •   Raw data processing: Jobs are failed due to "No space left on device"
  •  Raw data processing: Application finished with errors
  • 'heavy' queue closed in preparation for the KEKCC renewal
  • MCProduction restricted
  •   no plot for raw data processing . update: this is expected actually, see
  • SiteDirector "Failed to check the availability" 

LCG.KEK2.jp

  • Downtime 2020-11-30 00:00 - 2020-12-07 00:00 (UTC)
  • Many pilot submission failed
  • Many jobs failed  
  • Still all jobs failing with InputDataResolution on 2019/07/25.
  • Health checker info. : "Failed pilot jobs" has been found since 13:20:00 UTC on 2018/12/21.
  • all jobs are in "Input data resolution" status since 12.00 2018/12/18 UTC

LCG.KEK-merge.jp

  • Downtime 2020-11-30 00:00 - 2020-12-07 00:00 (UTC) 
  • Health checker info. : "Short pilot jobs" has been found at 00:20:00 UTC on 2019/08/26.

LCG.KISTI.kr 

  • "Failed Payload Job" has been observed since 2020-10-27 13:40 UTC (for 1 hours)

  • "Pilot Submission Failure" has been observed since 2020-10-15 03:31 UTC (for 13 hours)

  • "Failed Payload Job" has been observed since 2020-09-18 02:24 UTC

  • "Aborted Pilot job" has been observed since 2020-09-09 22:24 UTC (for 8 hours)

  • "Pilot Submission Failure" has been observed since 2020-09-02 03:30 UTC (for 21 hours)
  • -   KISTI-GSDC system in downtime
  • BLAH error seems to be happen if jobs exceed the allocated # of queues, not a problem (Site specific feature)  


LCG.KIT.de   

  • "Short Pilot" has been observed since 2020-11-02 08:40 UTC (for 6 hours). Also since 2020-11-10 21:40 UTC (for 9 hours) and since 2020-11-21 15:48 UTC (15hrs) 
  • Downtime from 2020-10-07 16:00 (UTC) upto 2020-10-31 20:00 (UTC)
  • Downtime from 2020-09-22 08:00 (UTC) to 2020-10-29 22:00 (UTC)  

  • Downtime from 2020-09-02 06:00 to 2020-09-23 14:00 (UTC)

LCG.KIT-TARDIS.de

  • "Aborted Pilot" has been observed since 2020-11-25 05:48 UTC (9hrs).
  • "Short Pilot" has been observed since 2020-11-21 15:48 UTC (16hrs)

LCG.KMI.jp

  • "Aborted Pilot" has been obserced since 2020-11-24 15:48 UTC (for 7 hours)
  • "Pilot Submission Failure" has been observed  since 2020-09-02 04:30 UTC (for 26 hours) and since 2020-11-23 02:48 UTC (for 12 hours) .
  • "Short Pilot" has been observed since 2020-09-23 01:24 UTC (for 5 hours)

LCG.LAL.fr

  • "Aborted Pilot" has been observed since 2020-11-28 21:48 UTC (for 57 hours)
  • "Aborted Pilot" has veen observed since 2020-11-25 15:48 UTC (for 7 hours)
  • Pilot submission failure
  •  
  • CREAM-CE to be decommissioned 

LCG.Legnaro.it

LCG.Napoli.it

  •  
  • 2020/11/23
  • "Pilot Submission Failure" has been observed since 2020-11-14 05:41 UTC (for 8 hours), since 2020-10-13 10:31 UTC (for 20 hours) and since 2020-09-03 21:30 UTC (for 1 hours) 
  • "Aborted Pilot" have observed since 2020-11-09 02:40 UTC (for 2 hours)
  • Downtime 2020-11-05 08:00 (UTC) - 2020-11-05 17:00 (UTC)

  • "Failed Payload Job" has been observed since 2020-10-27 14:40 UTC (for 1 hours).

  • LCG.Napoli.it: Downtime start time 2020-09-30 08:00 - end time 2020-10-02 16:00 (UTC)
  • Stalled jobs

LCG.NTU.tw

  • Downtime 2020-12-0115:00 to 2020-12-07 15:00 (UTC)  
  • "Failed Pilot" has been observed since 2020-11-07 21:40 UTC (for 81 hours), since 2020-10-27 04:40 UTC (for 10 hours) and since 2020-10-25 11:38 UTC (for 19 hours). 
  • "CRL has expired" for "node39-0,node37-0,node38-0" has been observed since 2020-07-20 07:07:26 UTC (for 16 hours) and and since 2020-09-23 02:24 UTC (for 4 hours) and since 2020-06-17 14:46:46 UTC (for 4 hours) 
  • "Belle II software could not be installed" for "belle2grid3.cc.ntu.edu.tw" has been observed since 2020-06-28 06:06:54 UTC (for 1 hours)
    Solved and verified : GGUS ticket https://ggus.eu/?mode=ticket_info&ticket_id=147803 has bee submitted at 2020-07-12 06:50

LCG.Pisa.it

LCG.Roma3.it

  • "Short Pilot" has been observed since 2020-11-11 21:40 UTC (for 5 hours) and since 2020-11-01 23:40 UTC (for 15 hours)
  • "Pilot Submission Failure" has been observed since 2020-08-29 15:18 UTC (for 111 hours)
  •  Health checker info. : "Failed pilot jobs" has been found at 14:20:00 UTC on 2019/04/20
  • Pilot Submission Failure has been observed since 2020-08-24 10:18 UTC (for 20 hours).

LCG.TAU.il

  • "Aborted Pilot" has been observed since 2020-11-15 05:41 UTC (for 8 hours) 
  • "Failed Payload Job" has been observed since 2020-10-27  13:40 UTC (for 1 hours) and since 2020-09-24 05:25 UTC (for 1 hours)
  • Downtime 2020-10-24 12:00 (UTC) - 2020-10-25 12:30 (UTC)  https://agira.desy.de/browse/BIIDCO-2789

  • "Pilot Submission Failure" has been observed since 2020-09-09 01:24 UTC (for 5 hours)

LCG.Torino.it

  • "Pilot Submission Failure" has been observed since 2020-11-24 07:48 UTC (for 15 hours)
  • "Pilot Submission Failure" has been observed since 2020-10-14 16:31 UTC (for 14 hours) and since  2019-12-28 18:34 UTC 
  • Downtime: 2020-09-25 10:00 (UTC) - 2020-10-09 10:00 (UTC)
  • "Pilot Submission Failure" has been observed since 2020-09-09 08:24 UTC (for 22 hours)(details).

  • "BLAH Error" has been observed since 2020-08-31 07:30:18 UTC (for 10 hours) and since  2020-02-09 05:53:05 UTC (for 9 hours)
  • "Failed Payload Job" has been observed since 2020-07-17 22:14 UTC (for 18 hours)

            GGUS: https://ggus.eu/index.php?mode=ticket_info&ticket_id=147926 has been submitted.

LCG.ULAKBIM.tr

  • The queue 'belle7' to be disabled. use only 'belle'

OSG.BNL.us

  • RawJobStatus: OSG.BNL.us is empty and LCG.KEK.jp with no activity for some hours.

  • Raw data processing: input data resolution and Application finished with errors

  • application finished with errors: 100%
  • Job submission check: Jobs fail with errors or input data resolution the last 24h (6:00 UTC, 2019/01/09) 
  • Number of concurrent MCProduction jobs restricted
  •  MCProduction jobs are mostly stalled

OSG.CORI.us

  • OSG.CORI.us resource has been removed because CY18 allocation was not approved

OSG.UMiss.us

  • "Pilot Submission Failure" has been observed since 2020-10-26 07:40 UTC (for 7 hours). Also since since 2020-11-16 23:47 UTC (for 129 hours).
  • Health checker info. : "Aborted pilot jobs" has been found at 22:20:00 UTC on 2019/06/02.
  • Health checker info. : "Short pilot jobs" has been found since 22:20:00 UTC on 2019/05/12.(details)
    Updated

SSH.KMI.jp

  • Date, Issue, Tickets...

Test.KIT.de

  • Downtime from 2020-09-22 08:00 (UTC) to 2020-10-29 22:00 (UTC)
  • Downtime from 2020-09-02 06:00 to 2020-09-23 14:00 (UTC)

Test.ULAKBIM.tr

  • Test site for the SL7 resources at ULAKBIM. No need to report problems.
  • No activities expected currently.

VCYCLE.LAL.fr

  • Downtime 2020-11-19 12:00 (UTC) - 2020-12-03 12:00 (UTC)

  • Downtime 2020-10-13 07:00 (UTC) - 2020-10-30 16:00 (UTC)
  • Downtime from 2020-08-28 14:00 (UTC) to  2020-08-31 14:00 (UTC) 
  • Under commissioning ()

VCYCLE.Napoli.it

  • Opportunistic site (Empty plot is not a problem)
  •  Ban lifted
  • "Sudo CE Error: sudo execution fails with return code 1"

VCYCLE.HNSC01.it, VCYCLE.HNSC02.it

  • Opportunistic site (Empty plot is not a problem)

Links