Contents

  •  Click here to expand...

 

Production Plans

  • MC9
    • MC9 started July 5, 2017
      • Phase III signal samples for prerelease-00-09-00b validation
      • Phase III Y(3S) generic (300 fb-1)
      • Phase III Y(4S) generic (4 x 1 ab-1)
      • Phase III Y(5S) generic (1 ab-1)
      • Phase III Y(6S) generic (100 fb-1)
      • Phase III Y(4S) signal samples
      • Phase III Y(4S) low multiplicity samples
      • Phase III Y(5S) signal samples
      • Phase III Y(6S) signal samples
      • Phase II Y(3S) signal samples
      • Phase II Y(4S) generic (50 fb-1)
      • Phase II Y(4S) signal samples
      • Phase II Y(4S) low multiplicity samples

Production Status

MC9

Official production started at ~21:00 JST on July 5, 2017. Starting with BGx0 generic samples (0.2 ab-1)

Submitted second batch of BGx0 generic jobs (July 7, ~04:00 JST)

Third and fourth batches of BGx0 generic jobs (July 10)

Submitted a few BGx0 signal samples (July 12 ~04:30 JST)

Submitted the phase 2 generic samples with BGx0 (July 14 ~04:00 JST)

Submitted the rest of the BGx0 signal samples (July 16 ~00:00 JST)

New requests for BGx0 signal samples submitted (July 19 ~01:30 JST)

MC9 restarted with BGx1 phase 2 samples - 50 fb-1 generic and signal samples (July 30 ~10:00 JST)

Submitted first batch of phase 3 samples with background - mixed and charged BBbar - about 140k jobs (August 12 ~08:00 JST)

Added uubar: ~180k jobs (August 13 ~10:30 JST)

Added ddbar: ~53k jobs (August 29 ~22:00 JST)

Added ssbar: ~51k jobs (Sept. 2 ~11:00 JST)

Submitted phase 3 low-multiplicity samples: ~43.5k jobs (Sept. 2 ~13:00 JST) → includes generator level skim so number of jobs is inflated compared to run time

Added ccbar and taupair: ~317k jobs (Sept 3 ~09:30 JST)

Submitted new signal MC samples: ~57.6k jobs (Sept 11 ~23:00 JST)

Submitted new phase 2 signal MC samples: ~21.2k jobs (Sept 12 ~05:00 JST) → short jobs < 3 hrs each

Submitted new phase 3 signal MC samples: ~600k jobs (Sept 24 ~22:30)

Submitted new phase 3 signal MC samples (almost all submitted now) (Sept 28 ~02:00 JST)

Submitted phase 3 Y(5S) bsbs and non-bsbs samples: ~54.4k jobs (Oct 6 ~04:30 JST)

Submitted phase 3 Y(5S) uubar samples: ~242k jobs (Oct 9 ~09:30 JST)

Submitted phase 3 Y(5S) ddbar samples: ~60k jobs (Oct 16 ~10:30 JST)

Submitted phase 3 Y(5S) ssbar and ccbar samples: ~300k jobs (Oct 18 ~02:00 JST)

Submitted a few last signal samples and the phase 3 Y(5S) taupair samples: ~200k jobs (Oct 24 ~03:00 JST)
     → The taupair samples should run as shorter jobs (~5-6 hours at KEKCC)

Submitted Y(6S) continuum samples: ~30k jobs (Oct 30 ~23:00 JST)

Submitted Y(3S) generic samples: ~260k 8h jobs (Nov 3 ~04:30 JST)

Submitted Y(3S) continuum samples (uubar): ~170k 5h jobs (Nov 5 ~08:00 JST)

Submitted Y(3S) continuum samples (ddbar, ssbar, ccbar): ~70k 5h jobs + ~80k 8h jobs (Nov 13 ~23:00 JST)

Submitted Y(3S) taupair samples: ~70k 5h jobs (Nov 17 ~01:00 JST)

Submitted remaining Y(5S) generic: ~4.5k jobs (Nov 20 ~00:00 JST)

Submitted next batch of Y(4S) generic (mixed, charged): ~100k 8h jobs, ~150k 5h jobs (Nov 20 ~03:00 JST)

Submitted Y(4S) uubar continuum: ~250k 8h jobs (Nov 25 ~22:00 JST)

Submitted Y(4S) ddbar continuum: ~100k 5h jobs (Nov 28 ~21:00 JST)

Submitted Y(4S) ssbar continuum: ~100k 5h jobs (Dec 6 ~21:00 JST)

Submitted Y(4S) ccbar continuum: ~200k 8h jobs (Dec 9 ~05:00 JST)

Submitted Y(4S) taupair: ~180k 5h jobs (Jan 2 ~08:30 JST)

Submitted Y(4S) mixed sample (batch 3): ~100k 8h jobs (Jan 8 ~23:00 JST)

Submitted Y(4S) bbbar samples for data challenge (phase 3, BGx1): ~220k 9hr jobs (Jan 18 ~02:00 JST)


Central Services

Dirac (dirac.cc.kek.jp, b2dchsv01-b2dchsv06.cc.kek.jp, b2dchsv08.cc.kek.jp)

  • BIIDCD-570 - Getting issue details... STATUS DIRAC system update around 2017-12-27 04:00 UTC (13:00 JST).
    BIIDCO-652 - Getting issue details... STATUS Short pilot job failure happened at many sites from 2017-12-27 04:00 KEK-SandboxSE access failure happened.
  • DIRAC system will be unavailable during KEKCC maintenance shutdown from 2018-01-10 12:00 to 01-11 12:00 JST
    https://wiki.kek.jp/display/kekcc/KEKCC+Status+Dashboard

DB Production (b2dchdb1.cc.kek.jp, b2dchdb2.cc.kek.jp, b2dcsdb1.cc.kek.jp, b2dcsdb2.cc.kek.jp)

DDM (dirac-ddm-prod.hep.pnnl.gov)

  • Date, Issue, Tickets...

Conditions DB (belle2db.hep.pnnl.gov, belle2db-files.hep.pnnl.gov)

  • DB access failure at several sites from 2017-12-29 around 14:00 UTC
    BIIDCO-659 - Getting issue details... STATUS

Monitor

LFC

File Transfers and Replication Status

See also DDM for related issues

FTS

Any problem in the FTS service or FTS monitoring are to be recorded here. Site/SE specific issues are to be recoreded under each SIte/SE

Replication Status

  • Date, Issue, Tickets...

Job Status Plot

        Most of the job status plots showing red region starting at the same point BIIDCO-679 - Getting issue details... STATUS

Job Summary

   Number of running jobs are decreasing rapidly   BIIDCO-686 - Getting issue details... STATUS

SEs

SE Common Issues

  • All the column/raw are red and no throughput.  BIIDCD-565 - Getting issue details... STATUS

Primary SEs

Primary SE BNL-TMP-SE (dcblsrm.sdcc.bnl.gov)

  • Date, Issue, Tickets...

Primary SE: CESNET-TMP-SE (dpm1.egee.cesnet.cz)      

  • Low efficiency: submitted to BIIDCO-600 - Getting issue details... STATUS
  • Date, Issue, Tickets...


Primary SE: CNAF-TMP-SE (storm-fe-archive.cr.cnaf.infn.it)

  • Showing zero done and non zero others in replication status plot.
  • Downtime: Massive flood of the datacenter, Start time: 2017-11-09 11:00, End time: 2018-01-18 09:00 (UTC) : BIIDCO-495 - Getting issue details... STATUS No clear date for the end.
    → SE Health check by DDM : No need to report this issue during the above downtime.
  • 2017/03/13: Not enough free space BIIDCO-137 - Getting issue details... STATUS
  • SE Health check by DDM : checksum, remove file, remove directory, download, upload, ls do not work since 2017-11-09 06:13:53 UTC.

Primary SE: DESY-TMP-SE (dcache-se-desy.desy.de)

  • No plot in replication trend
  • Not enough free space BIIDCO-107 - Getting issue details... STATUS

Primary SE:KEK2-TMP-SE (kek2-se01.cc.kek.jp)

  • File transfer failure to KEK2-TMP-SE (kek2-se03.cc.kek.jp) starting from 2018-01-05 around 07:00 UTC
    BIIDCO-669 - Getting issue details... STATUS
    GGUS ticket https://ggus.eu/index.php?mode=ticket_info&ticket_id=132722 has submitted at 2018-01-05 08:52 UTC
  • Transfers to KEK2-TMP-SE restricted. BIIDCO-689 - Getting issue details... STATUS
  • Still banned for removal due to the issue in the back-end HSM
    BIIDCO-41 - Getting issue details... STATUS

Primary SE: KISTI-TMP-SE (belle-se-head.sdfarm.kr)

  • SE Health check by DDM : remove file, download, upload do not work since 2017-11-24 03:11:10 UTC.  BIIDCO-560 - Getting issue details... STATUS
    • Recurring issue: SE Health check by DDM : remove file, download, upload do not work since 2017-12-05 12:03:51 UTC.  (Noted in JIRA ticket above)
    • Recuring issue:SE Health check by DDM : checksum, remove file, remove directory do not work since 2017-12-07 00:21:27 UTC. (Noted in JIRA ticket above)
    • SE Health check by DDM : remove file, remove directory, download, upload do not work since 2017-12-09 02:25:13 UTC(Noted in JIRA ticket above)

Primary SE: KIT-TMP-SE (dcachesrm-kit.gridka.de)

  • KIT SE giving occasional timeouts  BIIDCO-428 - Getting issue details... STATUS

  • There should be no more transfers to/from gridka-dcache.fzk.de
    • KIT SE: Hostname to change from gridka-dcache.fzk.de BIIDCO-191 - Getting issue details... STATUS

Primary SE: KMI-TMP-SE (nsrmfe01.hepl.phys.nagoya-u.ac.jp)

  • Not enough free space BIIDCO-136 - Getting issue details... STATUS


Primary SE: Napoli-TMP-SE (belle-dpm-01.na.infn.it

  • No plot in replication trend
  • Not enough free space  BIIDCO-146 - Getting issue details... STATUS
  • SE Health check by DDM : checksum, remove file, remove directory, download, upload, ls do not work since 2017-11-30 04:51:01 UTC.

Primary SE: PNNL-TMP-SE (se.hep.pnnl.gov) 

Primary SE: SIGNET-TMP-SE (dcache.ijs.si)

  • Date, Issue, Tickets...

Other SEs

Adelaide-TMP-SE (coepp-dpm-01.ersa.edu.au)

  • Date, Issue, Tickets...


CYFRONET-TMP-SE (dpm.cyf-kr.edu.pl)

  • Date, Issue, Tickets...


Frascati-TMP-SE (atlasse.lnf.infn.it)

  • Date, Issue, Tickets...

HEPHY-TMP-SE (hephyse.oeaw.ac.at)

  • Date, Issue, Tickets...

IPHC-TMP-SE (sbgse1.in2p3.fr)

  • Downtime: Decommissioning of WMS and L&B, Start time: 2017-12-13 00:00 UTC, End Time: 2017-12-31 00:00 UTC, BIIDCO-609 - Getting issue details... STATUS

Melbourne-TMP-SE (b2se.mel.coepp.org.au)

  • Date, Issue, Tickets...

McGill-TMP-SE  (storm02.clumeq.mcgill.ca)

  • Transfer failures: JIRA ticket BIIDCO-663 - Getting issue details... STATUS submitted at 07:45 UTC on 2018/01/04
  • BIIDCO-516 - Getting issue details... STATUS McGill-TMP-SE will be decomissioned in early 2018.

MPPMU-TMP-SE (grid-srm.rzg.mpg.de)

  • Date, Issue, Tickets...


NTU-TMP-SE (bgrid3.phys.ntu.edu.tw)

  • Date, Issue, Tickets...

Pisa-TMP-SE (stormfe1.pi.infn.it)

Torino-TMP-SE (se-srm-00.to.infn.it)

  • Date, Issue, Tickets...

ULAKBIM-TMP-SE (torik1.ulakbim.gov.tr)

  • Date, Issue, Tickets...

UMiss-TMP-SE (umiss005.hep.olemiss.edu)

  • Date, Issue, Tickets...

UVic-TMP-SE(charon01.westgrid.ca)

  • Low efficiency: mentioned in BIIDCO-598 - Getting issue details... STATUS on 2017/12/10
  • Low efficiency: JIRA ticket  BIIDCO-598 - Getting issue details... STATUS  has been submitted at 07:56 UTC on 2017/12/09

Sites

Sites Common Issues

  • Condition database problem appeared again and many jobs were failed from around 2017/12/13 16:00 to 2017/12/15 00:00 (UTC). BIIDCO-632 - Getting issue details... STATUS
  • Conditions database appears to be down so jobs may fail until it's back up 2017-08-16 10:14:56 +0200
    BIIDCO-257 - Getting issue details... STATUS
  • Belle II software could not be installed on " has been found since 14:20:00 UTC on 2017/12/20 in 8 sites. BIIDCO-643 - Getting issue details... STATUS

ARC.DESY.de

  • Health checker info. : "Short pilot jobs" has been found since 05:20:00 UTC on 2017/12/27.(details)
  • Health checker info. : "Short pilot jobs" has been found since 09:20:00 UTC on 2017/12/21.(details)
  • Health checker info. : "Short pilot jobs" has been found at 14:20:00 UTC on 2017/12/20.
  • Health checker info. : "Belle II software could not be installed on " has been found since 14:20:00 UTC on 2017/12/20.
  • Health checker info. : "Short pilot jobs" has been found at 02:20:00 UTC on 2017/12/19.(details)
  • Health checker info. : "Belle II software could not be installed on " has been found since 02:20:00 UTC on 2017/12/02.
  • Health checker info. : "Short pilot jobs" has been found since 09:20:00 UTC on 2017/11/01.(details)
  • Job submission check : Pilot submission failure has been found since 21:25:00 UTC on 2017/10/26. (details)
  • Job submission check : Pilot submission failure has been found at 19:27:00 UTC on 2017/10/26. (details)
  • Pilot submission failure has been found at 13:28:00 UTC on 2017/10/26. Found at grid-arcce1.desy.de.
  • ARC.DESY.de:"Short pilot jobs"  BIIDCO-207 - Getting issue details... STATUS

ARC.KIT.de

  • Health checker info. : "Short pilot jobs" has been found at 05:20:00 UTC on 2017/12/27.(details)
  • Health checker info. : "Belle II software could not be installed on " has been found since 14:20:00 UTC on 2017/12/20.
  • Date, Issue, Tickets..

ARC.LMU.de

  • This is a test site. Do not need to report any issue.

ARC.LMU2.de

  • Banned as currently no resource behind the CE BIIDCO-239 - Getting issue details... STATUS

ARC.Melbourne.au

  • There are stalled jobs found at 07 UTC on 2017/11/30.
  • BIIDCO-483 - Getting issue details... STATUS
  • Health checker info. : "Failed pilot jobs" has been found since 19:20:00 UTC on 2017/11/05.
  • 2017/11/03 "Failed pilot jobs" has been found since 15:20:00 UTC on 2017/11/03.(details)
  •  "Stalled" jobs BIIDCO-446 - Getting issue details... STATUS

ARC.MPPMU.de

  • Downtime info.: all CEs were in downtime within 24 hours. (GOCDB 24675)
  • Health checker info. : "Failed pilot jobs" has been found at 11:20:00 UTC on 2018/01/12.(details)
  • The site is in down time BIIDCO-678 - Getting issue details... STATUS
  • Job submission check:Pilot submission failure has been found at 22:29:00 UTC on 2017/09/22.
  • Health checker info. : "Failed pilot jobs" has been found at 14:20:00 UTC on 2017/09/22.(details)
  • Health checker info. : "Failed pilot jobs" has been found at 06:20:00 UTC on 2017/08/28.
  • Job submission check:Pilot submission failure has been found at 06:31:00 UTC on 2017/05/10. (details)

ARC.SIGNET.si

  • Health checker info. : "Short pilot jobs" has been found since 05:20:00 UTC on 2017/12/27.(details)
  • Job submission check : Pilot submission failure has been found since 12:27:00 UTC on 2017/12/22. (details)
  • Health checker info. : "Short pilot jobs" has been found at 21:20:00 UTC on 2017/12/21.(details)
  • Health checker info. : "Short pilot jobs" has been found since 09:20:00 UTC on 2017/11/01.(details)
  • ARC.SIGNET.si- "Stalled" jobs  BIIDCO-287 - Getting issue details... STATUS
  • Downtime info: 2018-01-18 09:00 to 2018-01-18 14:00(UTC) BIIDCO-693 - Getting issue details... STATUS

CLOUD.CC1_Krakow.pl

  • Not used in production yet. Seeing no jobs (no plot) is not a problem

DIRAC.Beihang.cn

  • Health checker info. : "Short pilot jobs" has been found since 09:20:00 UTC on 2018/01/05.  Ticket Filed: BIIDCO-670 - Getting issue details... STATUS
  • BIIDCO-647 - Getting issue details... STATUS Many MCProduction jobs failed at file upload stage for fail-over SEs 2017-12-24
  • The number of jobs limited. BIIDCO-289 - Getting issue details... STATUS
  • All the upload trials are failing against all the SEs configured: OutputSE (KMI-TMP-SE, PNNL-TMP-SE), Fail-over SEs(DESY-TMP-SE, Napoli-TMP-SE, PNNL-TMP-SE, KIT-TMP-SE)
  • Large % of failed jobs in DIRAC status plot (Added 2016-11-03 22:45:00 UTC) 

DIRAC.BINP.ru

  • Health checker info. : "Short pilot jobs" has been found since 05:20:00 UTC on 2017/12/27.(details)
  • Health checker info. : "Belle II software could not be installed on " has been found at 00:20:00 UTC on 2017/12/09.

DIRAC.BINP-VM.ru

  • Application finished with errors has been found at 7 UTC on 2017/11/30.

DIRAC.CINVESTAV.mx

  • Job submission check : Pilot submission failure has been found since 20:26:00 UTC on 2018/01/01. (details)
  • Health checker info. : "Short pilot jobs" has been found at 06:20:00 UTC on 2017/12/30.(details)
  • Health checker info. : "Short pilot jobs" has been found since 05:20:00 UTC on 2017/12/27.(details)
  • Health checker info. : "Belle II software could not be installed on " has been found since 14:20:00 UTC on 2017/12/20.
  • Health checker info. : "Aborted pilot jobs" has been found at 06:20:00 UTC on 2017/09/23.(details)
  • Job Submission failure is observed since 01:31:00 UTC on 2017/07/30.

DIRAC.DESY.de

  • Test site. Not in use in MC production

DIRAC.IITG.in

  • Health checker info. : "Short pilot jobs" has been found at 08:20:00 UTC on 2017/12/27.(details)
  • Health checker info. : "Belle II software could not be installed on " has been found since 14:20:00 UTC on 2017/12/20.
  • Health checker info. : "Aborted pilot jobs" has been found at 06:20:00 UTC on 2017/12/08.
  • Health checker info. : "Aborted pilot jobs" has been found at 22:20:00 UTC on 2017/10/26.(details)

DIRAC.LMU.de

  • Not in use in MC production BIIDCO-26 - Getting issue details... STATUS
    • Banned for now.

DIRAC.MIPT.ru

  • Health checker info. : "Aborted pilot jobs" has been found at 11:20:00 UTC on 2018/01/12.(details)
  • Health checker info. : "Short pilot jobs" has been found at 06:20:00 UTC on 2017/12/30.(details)
  • Health checker info. : "Short pilot jobs" has been found at 02:20:00 UTC on 2017/12/09
  •   MCProduction = 10 BIIDCO-309 - Getting issue details... STATUS

DIRAC.Nagoya.jp

  • Date, Issue, Tickets..
  • Health checker info. : "Short pilot jobs" has been found since 03:20:00 UTC on 2018/01/01.(details)
  • Health checker info. : "Short pilot jobs" has been found since 17:20:00 UTC on 2017/12/30.(details)
  • BIIDCO-660 - Getting issue details... STATUS
  • Health checker info. : "Belle II software could not be installed on " has been found at 23:20:00 UTC on 2017/12/08..

DIRAC.Nara-WU.jp

  • Decommissioned site: Since this still uses SL5, DIRAC pilot cannot be executed there.

DIRAC.NDU.jp

  • Health checker info. : "Short pilot jobs" has been found since 05:20:00 UTC on 2017/12/30.(details)
  • Health checker info. : "Short pilot jobs" has been found since 05:20:00 UTC on 2017/12/27.(details)
  • Date, Issue, Tickets...

DIRAC.Niigata.jp

  • Health checker info. : "Short pilot jobs" has been found since 05:20:00 UTC on 2017/12/27.(details)
  •   MCProduction = 20 BIIDCO-311 - Getting issue details... STATUS

DIRAC.Osaka-CU.jp

  • Health checker info. : "Short pilot jobs" has been found since 03:20:00 UTC on 2017/12/30.(details)
  • BIIDCO-656 - Getting issue details... STATUS
  • Health checker info. : "Short pilot jobs" has been found since 04:20:00 UTC on 2017/12/27.(details)
  • Health checker info. : "Short pilot jobs" has been found since 18:20:00 UTC on 2017/12/24.(details)
  •  MCProduction = 5 BIIDCO-312 - Getting issue details... STATUS

DIRAC.PNNL.us

  • Health checker info. : "Short pilot jobs" has been found at 08:20:00 UTC on 2017/12/27.(details)
  • Date, Issue, Tickets...

DIRAC.PNNL2.us

  • Date, Issue, Tickets...

DIRAC.PNNL-CASCADE.us

  • Seeing no jobs (no plot) is not a problem

DIRAC.PNNL-PIC.us

  • Seeing no jobs (no plot) is not a problem

DIRAC.RCNP.jp

  • Health checker info. : "Short pilot jobs" has been found since 04:20:00 UTC on 2017/12/30.(details)
  • Date, Issue, Tickets...

DIRAC.SSU.kr

  • Health checker info. : "Short pilot jobs" has been found since 05:20:00 UTC on 2017/12/27.(details)
  • Date, Issue, Tickets...

DIRAC.TIFR.in

  • Health checker info. : "Short pilot jobs" has been found at 08:20:00 UTC on 2017/12/27.(details)
  • Job submission check : Pilot submission failure has been found since 20:35:00 UTC on 2017/12/11. (details)
    • This is probably related to the other issues at this site: BIIDCO-605 - Getting issue details... STATUS
    • BIIDCO-607 - Getting issue details... STATUS
    • BIIDCO-635 - Getting issue details... STATUS
  • Not enough disk space  BIIDCO-571 - Getting issue details... STATUS
  • Whole production jobs failedbyfileupload failure since 2017-07-06 BIIDCO-205 - Getting issue details... STATUS

DIRAC.TMU.jp

  • Health checker info. : "Short pilot jobs" has been found since 08:20:00 UTC on 2018/01/05.  Ticketed:  BIIDCO-671 - Getting issue details... STATUS
  • Health checker info. : "Short pilot jobs" has been found since 05:20:00 UTC on 2017/12/27.(details)
  • Application finished with errors has been found at 06 UTC on 2017/12/1.

DIRAC.Tokyo.jp

  • Health checker info. : "Short pilot jobs" has been found since 21:20:00 UTC on 2017/09/15.(details)

DIRAC.UAS.mx


  • Health checker info. : "Short pilot jobs" has been found at 11:20:00 UTC on 2018/01/12.(details)
  • Health checker info. : "Short pilot jobs" has been found since 02:20:00 UTC on 2018/01/12.(details) BIIDCO-685 - Getting issue details... STATUS
  • Health checker info. : "Short pilot jobs" has been found since 05:20:00 UTC on 2018/01/11. BIIDCO-681 - Getting issue details... STATUS
  • Health checker info. : "Short pilot jobs" has been found since 02:20:00 UTC on 2018/01/06. → JIRA ticket BIIDCO-674 - Getting issue details... STATUS
  • Health checker info. : "Belle II software could not be installed on " has been found since 18:20:00 UTC on 2018/01/01.
    Report to admin : BIIDCO-658 - Getting issue details... STATUS
  • Health checker info. : "Belle II software could not be installed on " has been found since 04:20:00 UTC on 2017/12/30.
  • Health checker info. : "Belle II software could not be installed on " has been found since 14:20:00 UTC on 2017/12/20.
  • Health checker info. : "Short pilot jobs" has been found at 22:20:00 UTC on 2017/12/07.(details)

DIRAC.UVic.ca

  • Health checker info. : "Short pilot jobs" has been found at 07:20:00 UTC on 2018/01/13.(details)
  • Health checker info. : "Short pilot jobs" has been found since 07:20:00 UTC on 2018/01/12.(details)
  • Health checker info. : "Short pilot jobs" has been found since 09:20:00 UTC on 2018/01/05. Ticketed:  BIIDCO-672 - Getting issue details... STATUS
  • Health checker info. : "Short pilot jobs" has been found since 03:20:00 UTC on 2018/01/05.(details)
  • Health checker info. : "Short pilot jobs" has been found at 07:20:00 UTC on 2017/12/25.(details)
  • Health checker info. : "Short pilot jobs" has been found since 06:20:00 UTC on 2017/12/22.(details)
  • Health checker info. : "Short pilot jobs" has been found since 06:20:00 UTC on 2017/12/20.(details)
  • Health checker info. : "Short pilot jobs" has been found at 06:20:00 UTC on 2017/12/09.
  • Health checker info. : "Short pilot jobs" has been found since 19:20:00 UTC on 2017/11/30.(details


DIRAC.Yamagata.jp

  • Health checker info. : "Short pilot jobs" has been found since 06:20:00 UTC on 2018/01/03 BIIDCO-661 - Getting issue details... STATUS
  • Health checker info. : "Short pilot jobs" has been found since 21:20:00 UTC on 2017/12/30.(details)
  • Health checker info. : "Not enough disk space on " has been found since 14:20:00 UTC on 2017/12/24.
  • Health checker info. : "Not enough disk space on " has been found since 07:20:00 UTC on 2017/12/22.
  • Health checker info. : "Not enough disk space on " has been found since 07:20:00 UTC on 2017/11/28 . Comment added to BIIDCO-516
  • Health checker info. : "Not enough disk space on " has been found at 16:20:00 UTC on 2017/11/23 and 22:20:00 UTC on 2017/11/24.  BIIDCO-561 - Getting issue details... STATUS

DIRAC.Yonsei.kr

  • Health checker info. : "Short pilot jobs" has been found since 05:20:00 UTC on 2017/12/30.(details)
  • Health checker info. : "Short pilot jobs" has been found since 05:20:00 UTC on 2017/12/27.(details)
  • Date, Issue, Tickets...

LCG.CESNET.cz

  • Health checker info. : "Short pilot jobs" has been found since 05:20:00 UTC on 2017/12/27.(details)
  • Health checker info. : "Short pilot jobs" has been found since 12:20:00 UTC on 2017/12/2
  • Health checker info. : "Belle II software could not be installed on skurut34.grid.cesnet.cz,skurut23.grid.cesnet.cz" has been found since 14:20:00 UTC on 2017/12/20.
  • Job submission check:Pilot submission failure has been found since 04:24:00 UTC on 2017/10/06. (details)

LCG.CNAF.it

  • Downtime: Massive flood of the datacenter, Start time:    2017-11-13 11:00   (UTC), End time: 2018-01-18 09:00 (UTC) : BIIDCO-495 - Getting issue details... STATUS No clear date for the end.
  • Health checker info. : "Short pilot jobs" has been found at 04:21:00 UTC on 2017/11/06.(details)
  • Health checker info. : "Short pilot jobs" has been found at 20:20:00 UTC on 2017/11/02.(details)
  • "Failed pilot jobs" BIIDCO-448 - Getting issue details... STATUS
  • Health checker info. : "Short pilot jobs" has been found at 14:20:00 UTC on 2017/10/06.(details)

LCG.Cosenza.it

  • Date, Issue, Tickets...

LCG.CYFRONET.pl

  • Health checker info. : "Short pilot jobs" has been found at 07:20:00 UTC on 2018/01/13.(details)
  • Health checker info. : "Short pilot jobs" has been found since 05:20:00 UTC on 2018/01/11.
  • Health checker info. : "Short pilot jobs" has been found since 15:20:00 UTC on 2018/01/03: BIIDCO-662 - Getting issue details... STATUS
  • Health checker info. : "Short pilot jobs" has been found since 21:20:00 UTC on 2018/01/01.(details)
  • Health checker info. : "Short pilot jobs" has been found since 12:20:00 UTC on 2017/12/31.(details)
  • Health checker info. : "Short pilot jobs" has been found at 07:20:00 UTC on 2017/12/30.(details)
  • Health checker info. : "Short pilot jobs" has been found since 04:20:00 UTC on 2017/12/27.(details)
  •   Health checker info. : "Short pilot jobs" has been found  BIIDCO-531 - Getting issue details... STATUS
  • showing error in job status plot
  • "Aborted pilot jobs" has been found since 16:21:00 UTC on 2017/11/11.(details)

  • Downtime info: 2018-01-18 23:00 to 2018-02-27 23:00 (UTC) BIIDCO-694 - Getting issue details... STATUS

LCG.DESY.de

  • Health checker info. : "Short pilot jobs" has been found since 02:20:00 UTC on 2017/12/19.
  • Health checker info. : "Belle II software could not be installed on grid-wn0840.desy.de,grid-wn0793.desy.de" has been found since 14:20:00 UTC on 2017/12/20.
  • Health checker info. : "Short pilot jobs" has been found since 02:20:00 UTC on 2017/12/19.(details)
  • "Short Pilot" has been observed since 2017-12-14 13:31 UTC (for 7 hours) Ticket:  BIIDCO-625 - Getting issue details... STATUS
  • Health checker info. : "Short pilot jobs" has been found at 06:20:00 UTC on 2017/12/10.(details)
  • Health checker info. : "Belle II software could not be installed on grid-wn0132.desy.de" has been found since 13:20:00 UTC on 2017/12/01.
  • "Short pilot jobs" is releated to failure in SIGNET-TMP-SE (Solved and verified 2017-11-02 GGUS ticket). JIRA ticket updated.
  • Health checker info. : "Short pilot jobs" has been found since 04:20:00 UTC on 2017/08/27
  •  LCG.DESY.de: Stalled jobs BIIDCO-293 - Getting issue details... STATUS
  • Health checker info. : "Short pilot jobs" has been found at 14:20:00 UTC on 2017/11/22.

LCG.Frascati.it

  • Health checker info. : "Short pilot jobs" has been found since 20:20:00 UTC on 2017/12/30.(details)
  • Health checker info. : "Short pilot jobs" has been found since 21:20:00 UTC on 2017/12/07.
  • Health checker info. : "Short pilot jobs" has been found since 17:20:00 UTC on 2017/12/08

LCG.HEPHY.at

  • BIIDCO-653 - Getting issue details... STATUS
  • Health checker info. : "Failed pilot jobs" has been found at 22:20:00 UTC on 2017/09/14.(.details)
  • Job submission check:Pilot submission failure has been found at 13:31:00 UTC on 2017/09/04.
  •  MCProduction = 680 BIIDCO-281 - Getting issue details... STATUS


LCG.KEK.jp

  • Merge/User jobs fail with InputDataResolution BIIDCO-687 - Getting issue details... STATUS
  • Health checker info. : "Failed pilot jobs"  BIIDCO-300 - Getting issue details... STATUS

LCG.KEK2.jp

  • Merge/User jobs fail with InputDataResolution BIIDCO-687 - Getting issue details... STATUS
  • Health checker info. : "Short pilot jobs" has been found since 16:20:00 UTC on 2017/12/31.(details)
  • Health checker info. : "Short pilot jobs" has been found since 06:20:00 UTC on 2017/12/27.(details)
  • Job submission check:Pilot submission failure has been found at 22:27:00 UTC on 2017/09/15. (details)
  • Health checker info. : "Failed pilot jobs" BIIDCO-300 - Getting issue details... STATUS

LCG.KISTI.kr

  • Health checker info. : "Short pilot jobs" has been found since 06:20:00 UTC on 2017/12/27.(details)
  • Health checker info. : "BLAH ERROR" has been found at 07:20:00 UTC on 2017/12/20.(details)
  • Health checker info. : "BLAH ERROR" has been found since 21:20:00 UTC on 2017/12/09.(details)
  • Health checker info. :
    1. "BLAH ERROR" has been found since 11:20:00 UTC on 2017/12/01.
    2. "Short pilot jobs" has been found since 13:20:00 UTC on 2017/12/01.(details)
  • Health checker info. : "BLAH ERROR" has been found at 06:20:00 UTC on 2017/12/01.(details)
  • Health checker info. : "BLAH ERROR" has been found at 07:20:00 UTC on 2017/11/30.(details)
  • Health checker info. : "Short pilot jobs" has been found since 15:20:00 UTC on 2017/10/26.(details)
  • Health checker info. : "Short pilot jobs" has been found since 12:20:00 UTC on 2017/09/21.(details)
  • Health checker info. : "Short pilot jobs" has been found since 20:20:00 UTC on 2017/09/15.(details)
  • MCProduction= 10 BIIDCO-280 - Getting issue details... STATUS
  • Health checker info. : "Aborted pilot jobs" has been found since 18:20:00 UTC on 2017/12/07..(details)

LCG.KIT.de

  • The maximum number of job is set to be zero (for job drain) 2017/03/07.

LCG.KMI.jp

  • Health checker info. : "Short pilot jobs" has been found since 05:20:00 UTC on 2017/12/27.(details)
  • Health checker info. : "Aborted pilot jobs" has been found at 20:20:00 UTC on 2017/11/30.(details)

LCG.Legnaro.it

  • showing error in job status plots
  • Date, Issue, Tickets...

LCG.McGill.ca

  • Downtime schedule from 2018-01-11 22:30 (UTC) to 2018-04-01 04:02 (UTC) BIIDCO-678 - Getting issue details... STATUS
  • BIIDCO-516 - Getting issue details... STATUS LCG.McGill.ca will be decommissioned in early 2018.
  • Health checker info. : "Short pilot jobs" has been found since 05:20:00 UTC on 2017/12/27.(details)
  • Health checker info. : "Failed pilot jobs" has been found at 13:20:00 UTC on 2017/09/28.(details)

LCG.Melbourne.au

  • Replaced by ARC.Melbourne.au

LCG.Napoli.it

  • Downtime from 2018-01-06 20:00 (UTC) to 2018-01-12 20:00 (UTC)
  • Job submission check : Pilot submission failure has been found since 08:29:00 UTC on 2017/12/21.
  • Health checker info. : "Aborted pilot jobs" has been found at 06:20:00 UTC on 2017/12/21
  • Health checker info. : "Belle II software could not be installed on prismawn02.na.infn.it" has been found since 14:20:00 UTC on 2017/12/20.
  • Job submission check : Pilot submission failure has been found since 07:26:00 UTC on 2017/12/09. (details)
  • Downtime: Start time 2017-12-04 13:30 (UTC), End time 2017-12-15 19:00 (UTC), Hardware problem,  BIIDCO-577 - Getting issue details... STATUS
  • Health checker info. : "Not enough disk space on N/A" has been found since 15:20:00 UTC on 2017/12/01.
  • Downtime BIIDCO-577 - Getting issue details... STATUS
  • Job submission check : Pilot submission failure has been found since 05:23:00 UTC on 2017/11/30. (details)
  • Health checker info. : "Short pilot jobs" has been found since 21:20:00 UTC on 2017/09/15.(details)

LCG.NTU.tw


  • Job submission check : Pilot submission failure has been found since 19:26:00 UTC on 2018/01/12. (details)
  • Job submission check : Pilot submission failure has been found since 19:32:00 UTC on 2018/01/11. (details BIIDCO-684 - Getting issue details... STATUS
  • GGUS ticket : "TW-NTU-HEP : Pilot submission failure at bell2grid2.cc.ntu.edu.tw"(132648) has been submited at 04:47:42 UTC on 2017/12/30.
  • Job submission check : Pilot submission failure has been found since 16:26:00 UTC on 2017/12/28. (details)
  • Pilot submission failure on bell2grid2.cc.ntu.edu.tw from 2017-12-28 16:00 UTC
    BIIDCO-657 - Getting issue details... STATUS
    GGUS ticket : https://www.ggus.org/?mode=ticket_info&ticket_id=132648 has submitted at 2017-12-30 04:47 UTC
  • MCProduction = 20 BIIDCO-279 - Getting issue details... STATUS
  • Health checker info. : "Belle II software could not be installed on node16" has been found since 20:20:00 UTC on 2017/12/08.
  • Health checker info. : "Belle II software could not be installed on bgrid1.phys.ntu.edu.tw" has been found at 00:20:00 UTC on 2017/12/09.

LCG.Pisa.it

  • Job submission check : Pilot submission failure has been found since 14:29:00 UTC on 2017/12/31. (details)
  • Job submission check : Pilot submission failure has been found since 14:29:00 UTC on 2017/12/31. (details)
  • Health checker info. : "Failed to install DIRAC on se1wn75.pi.infn.it,se1wn56.pi.infn.it,n2wn13.pi.infn.it" has been found since 06:20:00 UTC on 2017/12/27.
  • Job submission check : Pilot submission failure has been found at 07:24:00 UTC on 2017/12/27. (details)
  • Job submission check : Pilot submission failure has been found since 18:27:00 UTC on 2017/12/17.
  • Health checker info. : "Failed to install DIRAC on se1wn23.pi.infn.it,se1wn89.pi.infn.it" has been found since 09:20:00 UTC on 2017/12/20.
  • Job submission check : Pilot submission failure has been found since 18:27:00 UTC on 2017/12/17. (details)
  • Health checker info. : "Failed to install DIRAC on se1wn56.pi.infn.it" has been found since 21:20:00 UTC on 2017/12/08.
  • Health checker info. : "Failed to install DIRAC on se1wn56.pi.infn.it,se1wn89.pi.infn.it" has been found since 21:20:00 UTC on 2017/12/07
  • Job submission check : Pilot submission failure has been found since 02:39:00 UTC on 2017/12/08.

LCG.Roma3.it

  • Health checker info. : "Failed to install DIRAC on wn-01-01-05.cluster.roma3" has been found since 06:20:00 UTC on 2018/01/11. BIIDCO-682 - Getting issue details... STATUS
  • Health checker info. : "Failed to install DIRAC on wn-01-01-05.cluster.roma3" has been found since 05:20:00 UTC on 2018/01/06. → BIIDCO-675 - Getting issue details... STATUS
  • Job submission check : Pilot submission failure has been found at 06:29:00 UTC on 2017/12/21
  • In the File Transfer Monitoring, source of storm-01.roma3.infn.it+, the efficiency is zero BIIDCO-579 - Getting issue details... STATUS .
  • Roma3 commissioning BIIDCO-111 - Getting issue details... STATUS

LCG.Torino.it

  • Job submission check : Pilot submission failure has been found at 22:29:00 UTC on 2017/12/31. (details)
  • Health checker info. : "Short pilot jobs" has been found at 06:20:00 UTC on 2017/12/09.
  • BIIDCO-352 - Getting issue details... STATUS BIIDCO-352 - Getting issue details... STATUS

LCG.ULAKBIM.tr

  • Date, Issue, Tickets...

OSG.BNL.us

  • Health checker info. : "Short pilot jobs" has been found at 07:20:00 UTC on 2018/01/13.(details)
  • Health checker info. : "Short pilot jobs" has been found since 05:20:00 UTC on 2018/01/11. BIIDCO-683 - Getting issue details... STATUS
  • Health checker info. : "Short pilot jobs" has been found since 14:20:00 UTC on 2017/12/20.(details)
  • Health checker info. : "Short pilot jobs" has been found at 01:20:00 UTC on 2017/12/02.
  • Date, Issue, Tickets...

OSG.CORI.us

  •   OSG.CORI.us resource has been removed because CY18 allocation was not approved

OSG.UMiss.us

  • enough space error: Application finished with errors  BIIDCO-241 - Getting issue details... STATUS

SSH.KMI.jp

  • Date, Issue, Tickets...

VCYCLE.Napoli.it

  • Date, Issue, Tickets...
  • Down time from 2018-01-06 20:00 (UTC) to 2018-01-12 20:00 (UTC)

Links


Twiki settings:

  • Set INTERWIKIPLUGIN_RULESTOPIC = InterWikis
  • Set EDITMETHOD =ra
  • No labels