Contents

  •  Click here to expand...

    l


Production Plans

  • MC10
  • MC11
  • SKIM9x2, 40 TB
  • GCR2c
  • prod6

Production Status

 Older status updated (click to expand)

Official MC9 production started at ~21:00 JST on July 5, 2017.

Starting with BGx0 generic samples (0.2 ab-1)

Submitted second batch of BGx0 generic jobs (July 7, ~04:00 JST)

Third and fourth batches of BGx0 generic jobs (July 10)

Submitted a few BGx0 signal samples (July 12 ~04:30 JST)

Submitted the phase 2 generic samples with BGx0 (July 14 ~04:00 JST)

Submitted the rest of the BGx0 signal samples (July 16 ~00:00 JST)

New requests for BGx0 signal samples submitted (July 19 ~01:30 JST)

MC9 restarted with BGx1 phase 2 samples - 50 fb-1 generic and signal samples (July 30 ~10:00 JST)

Submitted first batch of phase 3 samples with background - mixed and charged BBbar - about 140k jobs (August 12 ~08:00 JST)

Added uubar: ~180k jobs (August 13 ~10:30 JST)

Added ddbar: ~53k jobs (August 29 ~22:00 JST)

Added ssbar: ~51k jobs (Sept. 2 ~11:00 JST)

Submitted phase 3 low-multiplicity samples: ~43.5k jobs (Sept. 2 ~13:00 JST) → includes generator level skim so number of jobs is inflated compared to run time

Added ccbar and taupair: ~317k jobs (Sept 3 ~09:30 JST)

Submitted new signal MC samples: ~57.6k jobs (Sept 11 ~23:00 JST)

Submitted new phase 2 signal MC samples: ~21.2k jobs (Sept 12 ~05:00 JST) → short jobs < 3 hrs each

Submitted new phase 3 signal MC samples: ~600k jobs (Sept 24 ~22:30)

Submitted new phase 3 signal MC samples (almost all submitted now) (Sept 28 ~02:00 JST)

Submitted phase 3 Y(5S) bsbs and non-bsbs samples: ~54.4k jobs (Oct 6 ~04:30 JST)

Submitted phase 3 Y(5S) uubar samples: ~242k jobs (Oct 9 ~09:30 JST)

Submitted phase 3 Y(5S) ddbar samples: ~60k jobs (Oct 16 ~10:30 JST)

Submitted phase 3 Y(5S) ssbar and ccbar samples: ~300k jobs (Oct 18 ~02:00 JST)

Submitted a few last signal samples and the phase 3 Y(5S) taupair samples: ~200k jobs (Oct 24 ~03:00 JST)
     → The taupair samples should run as shorter jobs (~5-6 hours at KEKCC)

Submitted Y(6S) continuum samples: ~30k jobs (Oct 30 ~23:00 JST)

Submitted Y(3S) generic samples: ~260k 8h jobs (Nov 3 ~04:30 JST)

Submitted Y(3S) continuum samples (uubar): ~170k 5h jobs (Nov 5 ~08:00 JST)

Submitted Y(3S) continuum samples (ddbar, ssbar, ccbar): ~70k 5h jobs + ~80k 8h jobs (Nov 13 ~23:00 JST)

Submitted Y(3S) taupair samples: ~70k 5h jobs (Nov 17 ~01:00 JST)

Submitted remaining Y(5S) generic: ~4.5k jobs (Nov 20 ~00:00 JST)

Submitted next batch of Y(4S) generic (mixed, charged): ~100k 8h jobs, ~150k 5h jobs (Nov 20 ~03:00 JST)

Submitted Y(4S) uubar continuum: ~250k 8h jobs (Nov 25 ~22:00 JST)

Submitted Y(4S) ddbar continuum: ~100k 5h jobs (Nov 28 ~21:00 JST)

Submitted Y(4S) ssbar continuum: ~100k 5h jobs (Dec 6 ~21:00 JST)

Submitted Y(4S) ccbar continuum: ~200k 8h jobs (Dec 9 ~05:00 JST)

Submitted Y(4S) taupair: ~180k 5h jobs (Jan 2 ~08:30 JST)

Submitted Y(4S) mixed sample (batch 3): ~100k 8h jobs (Jan 8 ~23:00 JST)

Submitted a few low multiplicity samples (Jan 18)

Submitted Y(4S) bbbar samples for data challenge (phase 3, BGx1): ~220k 9hr jobs (Jan 18 ~02:00 JST)

Submitted Y(4S) ddbar samples for data challenge (phase 3, BGx1): ~130k 5 hr jobs (Jan 22 ~22:00 JST)

Submitted phase 3 BG overlay production scripts (Jan 27 ~00:50 JST)

Submitted Y(4S) uubar samples for data challenge (phase 3, BGx1): ~290k 9 hr jobs (Feb 2 ~00:30 JST)

Submitted Y(4S) ssbar samples for data challenge (phase 3, BGx1): ~95k 6 hr jobs (Feb 3 ~21:45 JST)

Submitted Y(4S) ccbar samples for data challenge (phase 3, BGx1): ~320k 7.5 hr jobs (Feb 13 ~10:30 JST)

Submitted Y(4S) taupair samples for data challenge (phase 3, BGx1): ~160k 8 hr jobs (Feb 19 ~23:00 JST)

Submitted Y(4S) generic charged (not for data challenge): ~150k 5h jobs (Mar 1 ~23:00 JST)

Submitted Y(4S) uubar samples (phase 3, BGx1): ~250k 8h jobs (Mar 2 ~21:00 JST)

Submitted phase 3, BGx1 low multiplicity samples as requested by the bottomonium group (Mar 8 ~00:00 JST)

Submitted MC10 analysis validation samples, both BGx0 and BGx1 (Mar 9, 01:40 JST)

More MC9 skim jobs approved (Mar 12, ~23:00 JST)

Submitted Y(4S) ddbar samples (phase 3, BGx1): ~100k 5h jobs (Mar 13 ~00:00 JST)

Submitted Y(4S) ssbar samples (phase 3, BGx1): ~95k 5h jobs (Apr 3 ~04:00 JST)

Submitted Y(4S) ccbar samples (phase 3, BGx1): ~200k 8h jobs (Apr 4 ~02:30 JST)

Submitted Y(4S) taupair samples (phase 3, BGx1): ~180k 5h jobs (Apr 5 ~01:30 JST)

All scheduled MC9 fabrication jobs have finished (as of April 16, 2018)

First official batch of MC10 jobs submitted - phase 3 Y(4S) mixed samples: BGx1 ~100k 8h jobs and BGx0 ~36k 5h jobs (April 14 ~00:00 JST)

MC10 - phase 3 Y(4S) charged samples: BGx1 ~110k 8h jobs and BGx0 ~38k 5h jobs (April 17 ~01:00 JST)

First (small) batch of MC10 signal jobs submitted (April 20 ~03:30 JST)

MC10 - phase 3 Y(4S) uubar samples: BGx1 ~180k 8h jobs and BGx0 ~64k 5h jobs (April 23 ~05:45 JST)

Second batch of MC10 signal samples submitted (April 23 ~23:00 JST)

Third batch of MC10 signal samples submitted (April 27)

Submitted Phase 2 Y(4S) generic samples (May 1 ~23:30 JST)

Fourth batch of MC10 signal samples (mostly non-4S and/or phase 2) (May 2 ~02:00 JST)

MC10 - phase 3 Y(4S) ddbar samples: BGx1 ~45k 8h jobs and BGx0 ~15k 5h jobs (May 5 ~03:00 JST)

MC10 - phase 3 Y(4S) ssbar samples: BGx1 ~45k 8h jobs and BGx0 ~15k 5h jobs (May 9 ~02:00 JST)

Fifth batch of MC10 signal samples (mostly non-4S and/or phase 2) (May 9 ~03:30 JST)

MC10 - phase 3 Y(4S) ccbar samples: BGx1 ~200k 8h jobs and BGx0 ~50k 5h jobs (May 9 ~02:00 JST)

Submitted a large batch of phase 2 signal samples

MC10 - phase 3 Y(4S) taupair samples: BGx1 ~150k 8h jobs and BGx0 ~37k 5h jobs (May 17 ~02:00 JST)

Small signal MC batch (May 17 ~02:00 JST)

Submitted first batch of BGx2 productions: mixed phase 3 ~75k jobs and phase 2 ~2k jobs (May 25 ~20:30 JST)

Submitted many BGx1 signal MC samples (May 26 ~23:00 JST)

Submitted additional BGx1 signal MC samples (May 27 ~22:00 JST)

Submitted first small batch of BGx2 signal samples (June 9 ~00:00 JST)

MC10 - phase 3 Y(4S) uubar samples: BGx1 ~180k jobs (June 9 ~00:00 JST)

Submitted second batch of BGx2 signal samples (June 9 ~15:00 JST)

Submitted DR3 samples (~10 fb-1): 7k jobs (June 9 ~16:00 JST)

MC10 - phase 3 Y(4S) ddbar samples: BGx1 ~45k jobs (June 10 ~11:00 JST)

Submitted BGx2 signal samples (June 12 starting around ~23:00 JST - will stagger the approval to avoid over-stressing the system)

Submitted ~100 fb-1 phase 3 dress rehearsal jobs (June 15)

MC10 - phase 3 Y(4S) ssbar samples: BGx1 ~45k 8h jobs (June 15 ~3:30 JST)

MC10 - phase 3 Y(4S) ccbar samples: BGx1 ~200k 8h jobs (June 17 ~8:00 JST)

MC10 - phase 3 Y(4S) taupair samples: BGx1 ~150k 8h jobs (June 17 ~8:00 JST)

Submitted a few additional MC10 signal samples before upgrade (July 24 ~22:00 JST)

Submitted a few small MC10 productions (Aug 14 ~23:30 JST) - will submit more if these proceed smoothly

Large generic MC11 productions (release-02-00-01) productions submitted (Aug 16 - current)

Few MC10 signal productions (Sept 13 ~00:00 JST)

Submitting MC11 BGx0 generic samples (Sept 23 ~ 02:00 JST)

Some MC10 signal productions (Sept 26 ~04:00 JST)

Submitting MC11 BGx0 generic samples with early phase 3 geometry (Oct 3 ~07:30 JST)

Submitting new MC11 signal samples (Oct 9 ~13:00 JST)

Submitting MC11 Y(5S) samples (Oct 9 ~21:30 JST)

Many MC11 signal samples submitted (Oct 12 ~00:00 JST)

Most productions have finished. New productions will be started when updated beam background files are ready.

Some small signal MC samples were submitted on Nov 11 ~06:00 JST, Nov 13, Nov 14.

Phase 2 reprocessing with the distributed computing system (prod6b) started Nov 15 ~00:00 JST.

Large phase 3 BGx1 generic MC samples were submitted recently (~Nov 20). These should be relatively short jobs (though there are many of them). They should run for the next few weeks.


Started submitting a few relatively small MC production jobs for MC12 (Feb 20)


Central Services

Dirac (dirac.cc.kek.jp, b2dchsv01-b2dchsv06.cc.kek.jp, b2dchsv08.cc.kek.jp)

  • Date, Issue, Tickets.

DB Production (b2dchdb1.cc.kek.jp, b2dchdb2.cc.kek.jp, b2dcsdb1.cc.kek.jp, b2dcsdb2.cc.kek.jp

  • Date, Issue, Tickets...


DDM (bldirac01.sdcc.bnl.gov)

  • 2018-03-01 DDM deletion task seems stuck BIIDCO-808 - Getting issue details... STATUS

Conditions DB ()

Monitor

  • March 15, 2019 15:30 UTC  DIRAC B2PlotDisplay Page doesn't load BIIDCO-1726 - Getting issue details... STATUS
  • Issue in access to DIRAC Web Portal BIIDCO-1247 - Getting issue details... STATUS

LFC

File Transfers and Replication Status

See also Computing OperationStatus#DDM for related issues

FTS

Any problem in the FTS service or FTS monitoring are to be recorded here. Site/SE specific issues are to be recorded under each SIte/SE

Note that the FTS dashboard we use is an "old" instance and not well-maintained. We, Belle II members in general, do not have access to the "new" monitoring. When the dashboard is down, the shifters just need to notify the expert and skip the corresponding part of their work. The expert should check the new monitoring, for the access to the monitoring page is limited.

  • Very low activity and nothing since lUTC 02.00hr (16 March 2019)

     BIIDCO-1716 - Getting issue details... STATUS

Replication Status

  • 2019-1-19  almost zero done, with a increasing numbers of scheduled jobs for more than 5 SEs and more than 5 hours.
    BIIDCO-1618 - Getting issue details... STATUS
  • 2018-09-22  The number of "Done Jobs" is lower than the number of "Scheduled Jobs" during the last 6 hours or more
  • 2018-07-02   No Donetransfer,  several scheduled and rapid increase of Waiting replication BIIDCO-1125 - Getting issue details... STATUS

Job Status Plot

  • Low activity and job status is failed in many site on 2019/03/14 during the last day  BIIDCO-1725 - Getting issue details... STATUS
  • Many sites show "Job finished with errors" since 16:00 UTC on 2019/02/01.  BIIDCO-1656 - Getting issue details... STATUS

  • Job status is failed almost on every site BIIDCO-1717 - Getting issue details... STATUS since 07:00 UTC on 2019/03/11
  • Almost all sites show "Job finished with errors" since 16:00 UTC on 2019/01/24. BIIDCO-1626 - Getting issue details... STATUS  issued.

Job Summary


SEs

SE Common Issues

  • Issues with individual SEs should be recorded below (Primary SEs or Other SEs).

Raw data SEs

Raw data SE: KEK-RAW-SE (srm://kek2-se02.cc.kek.jp:8444/srm/managerv2?SFN=/belle/RAW)


Raw data SE: BNL-TAPE-SE (srm://dcblsrm.sdcc.bnl.gov:8443/srm/managerv2?SFN=/pnfs/sdcc.bnl.gov/tape)


Primary SEs

Primary SE: BNL-TMP-SE (dcblsrm.sdcc.bnl.gov)

  • SE Health check by DDM : download does not work since 2019-03-20 12:41:55 UTC.
  • SE Health check by DDM : download does not work since 2019-03-19 13:52:52 UTC.

  • SE Health check by DDM : download, upload do not work since 2019-03-17 19:59:33 UTC.
  •  UNAVAILABLE files BIIDCO-1302 - Getting issue details... STATUS

Primary SE: CESNET-TMP-SE (dpm1.egee.cesnet.cz)

  • date, issue, ticket

Primary SE: CNAF-TMP-SE (storm-fe-archive.cr.cnaf.infn.it)

  • 2019/03/04 CNAF-TMP-SE SE : remove file, remove directory, download, upload, ls do not work since 2019-03-03 21:50:21 UTC.  BIIDCO-1695 - Getting issue details... STATUS
  • 2019/01/27 File transfer failures from CNAF-TMP-SE to NTUCC-DATA-SE. BIIDCO-1637 - Getting issue details... STATUS
  •  Cotinuous timeout failure between NTU-CC-TMP-SE and CNAF-TMP-SE BIIDCO-1310 - Getting issue details... STATUS

Primary SE: DESY-TMP-SE (dcache-se-desy.desy.de)

  • date, issue, ticket...

Primary SE: KEK-DISK-TMP-SE (srm://kek2-se03.cc.kek.jp:8444/srm/managerv2?SFN=/disk/belle/TMP)


Primary SE: KEK2-TMP-SE (srm://kek2-se03.cc.kek.jp:8444/srm/managerv2?SFN=/belle/TMP)

Primary SE: KISTI-TMP-SE (belle-se-head.sdfarm.kr)

Primary SE: KIT-TMP-SE (dcachesrm-kit.gridka.de)

  • SE Health check by DDM : ls does not work since 2019-03-06 23:32:27 UTC.  BIIDCO-1708 - Getting issue details... STATUS
  • KIT SE giving occasional timeouts  BIIDCO-428 - Getting issue details... STATUS

Primary SE: KMI-TMP-SE (nsrmfe01.hepl.phys.nagoya-u.ac.jp )

  • SE Health check by DDM : remove file, remove directory, ls do not work since 2019-03-06 23:33:05 UTC.  BIIDCO-1709 - Getting issue details... STATUS

Primary SE: Napoli-TMP-SE (belle-dpm-01.na.infn.it )

  • SE Health check by DDM : remove file, remove directory, ls do not work since 2019-03-20 08:00:30 UTC. BIIDCO-1732 - Getting issue details... STATUS
  • SE Health check by DDM : ls does not work since 2019-02-15 22:18:43 UTC. BIIDCO-1676 - Getting issue details... STATUS

Primary SE: SIGNET-TMP-SE (dcache.ijs.si )

  • File transfer failures or to low: From KMI-TMP-SE and Pisa-TMP-SE to to SINET-TMP-SE on 2019-01-25 (JST).  BIIDCO-1625 - Getting issue details... STATUS
  • File Transfer failures : File Transfer Efficiency is too low from SIGNET-TMP-SE to KEK2-TMP-SE since about 2019-01-13 1:00 (UTC)  BIIDCO-1603 - Getting issue details... STATUS
  • Frequent transfer failure from SOURCE SIGNET-TMP-SE since 2018-12-25  BIIDCO-1561 - Getting issue details... STATUS
    GGUS ticket https://ggus.eu/?mode=ticket_info&ticket_id=138999 has submitted 2018-12-27

Other SEs

Adelaide-TMP-SE (coepp-dpm-01.ersa.edu.au)

  • Date, Issue, Tickets...

CYFRONET-TMP-SE (dpm.cyf-kr.edu.pl)

  • Date, Issue, Tickets...

CINVESTAV-TMP-SE (jaguar-se.fis.cinvestav.mx)

  • Date, Issue, Tickets...

Frascati-TMP-SE (atlasse.lnf.infn.it)

  • Date, Issue, Tickets...

HEPHY-TMP-SE (hephyse.oeaw.ac.at)

  • Date, Issue, Tickets...

IPHC-TMP-SE (sbgse1.in2p3.fr)

LAL-TMP-SE (grid05.lal.in2p3.fr)

Melbourne-TMP-SE (b2se.mel.coepp.org.au)

  • transfer rate to be zero BIIDCO-896 - Getting issue details... STATUS

  • Melbourne-DATA-SE banned for write BIIDCO-927 - Getting issue details... STATUS

McGill-TMP-SE  (storm02.clumeq.mcgill.ca)

  • BIIDCO-516 - Getting issue details... STATUS McGill-TMP-SE will be decomissioned in early 2018.

MPPMU-TMP-SE (grid-srm.rzg.mpg.de)

  • Date, Issue, Tickets...

NTU-TMP-SE, NTU-CC-TMP-SE (bgrid3.phys.ntu.edu.tw, belle2grid3.cc.ntu.edu.tw)

  • 2019/01/27 File transfer failures from CNAF-TMP-SE to NTUCC-DATA-SE. BIIDCO-1637 - Getting issue details... STATUS
  • File transfer failure and cancellation to NTUCC-DATA-SE happened 2018-12-22 BIIDCO-1551 - Getting issue details... STATUS
  • Frequent timtout has observed between NTU-CC-TMP-SE and CNAF-TMP-SE BIIDCO-1310 - Getting issue details... STATUS
    GGUS ticket https://ggus.eu/index.php?mode=ticket_info&ticket_id=137334 has submitted 2018-09-22 05:10 UTC
  • NTUCC-TMP-SE banned for write  BIIDCO-1333 - Getting issue details... STATUS

Pisa-TMP-SE (stormfe1.pi.infn.it)

  • BIIDCO-1693 - Getting issue details... STATUS
  • Low Transfer Efficiency for source is observed on 02/27/19. Updated BIIDCO-1355 - Getting issue details... STATUS

PNNL-TMP-SE (se.hep.pnnl.gov) 

  • Being decommissioned. No need to report any issues.

Roma3-TMP-SE (storm-01.roma3.infn.it)

  •  Date, Issue, Tickets...

TAU-TMP-SE (tau-se.hep.tau.ac.il)

Torino-TMP-SE (se-srm-00.to.infn.it)

ULAKBIM-TMP-SE (torik1.ulakbim.gov.tr)

UMiss-TMP-SE (umiss005.hep.olemiss.edu)


UVic-TMP-SE(charon01.westgrid.ca)

  • File Transfer failures : File Transfer Efficiency is too low from UVic-DATA-SE. since about 2018-12-18 1:00 (UTC)  BIIDCO-1491 - Getting issue details... STATUS


Sites

Sites Common Issue

ARC.DESY.de

  • Health checker info. : "Failed pilot jobs" has been found since 05:20:00 UTC on 2019/03/02.  BIIDCO-1696 - Getting issue details... STATUS

  • all jobs are in "Input data resolution" status since 17:00:00 UTC on 2018/12/18. BIIDCO-1541 - Getting issue details... STATUS
  • Health checker info. : "Short pilot jobs" has been found at 14:20:00 UTC on 2018/12/13.(details).  BIIDCO-1518 - Getting issue details... STATUS
  • Job submission check : Pilot submission failure has been found since 05:42:00 UTC on 2018/10/25. (details)
  •  Reconfiguration for the site queues: BIIDCO-1392 - Getting issue details... STATUS

ARC.DESY-test.de

  • A test queue for the new CE. BIIDCO-1469 - Getting issue details... STATUS

ARC.KIT.de

  • Health checker info. : "Failed pilot jobs" has been found at 21:20:00 UTC on 2019/02/21.
  • Health checker info. : "Short pilot jobs" has been found since 08:20:00 UTC on 2019/02/14. BIIDCO-1588 - Getting issue details... STATUS

ARC.LMU.de

  • This is a test site. Do not need to report any issue.

ARC.LMU2.de

  • Banned as currently no resource behind the CE BIIDCO-239 - Getting issue details... STATUS

ARC.Melbourne.au


ARC.MPPMU.de

  • Health checker info. : "Failed pilot jobs" has been found at 09:20:00 UTC on 2018/10/25.(details) BIIDCO-1537 - Getting issue details... STATUS
  • Job submission check : Pilot submission failure has been found since 13:26:00 UTC on 2018/10/21. BIIDCO-1386 - Getting issue details... STATUS
  • BIIDCO-128 - Getting issue details... STATUS

ARC.SIGNET.si.

  • Job submission check : Pilot submission failure has been found at 14:22:00 UTC on 2019/03/14 BIIDCO-1547 - Getting issue details... STATUS
  • Health checker info. : "Short pilot jobs" has been found at t 04:20:00 UTC on 2019/01/31 and at 14:20:00 UTC on 2019/01/25.
  • Job status check: Application finished with errors (5% of the jobs) at 11:15 UTC on 2018/12/21.
  • Health checker info. : "Short pilot jobs" has been found since 17:20:00 UTC on 2018/12/20.(details)
  • Job submission check : Pilot submission failure has been found at 13:26:00 UTC on 2018/12/19. BIIDCO-1547 - Getting issue details... STATUS

  • "Failed to install DIRAC on " has been found since 20:20:00 UTC on 2018/11/03. BIIDCO-1420 - Getting issue details... STATUS

  • "Short pilot jobs" has been found at 14:20:00 UTC on 2018/10/29. BIIDCO-1519 - Getting issue details... STATUS
  • Health che cker info. : "Aborted pilot jobs" has been found since 20:20:00 UTC on 2018/10/20. BIIDCO-1383 - Getting issue details... STATUS
  • Health checker info. : "Failed pilot jobs" has been found since 06:20:00 UTC on 2018/10/03.(details) BIIDCO-1350 - Getting issue details... STATUS

CLOUD.CC1_Krakow.pl

  • Not used in production yet. Seeing no jobs (no plot) is not a problem

DIRAC.Beihang.cn

  • Health checker info. : "Short pilot jobs" has been found at 06:20:00 UTC on 2018/12/08. BIIDCO-1534 - Getting issue details... STATUS
  • Job status check: "application finished with errors" (100% currently) on 2018/10/26.
  • Job submission check : Pilot submission failure has been found since 09:24:00 UTC on 2018/09/21. (details) BIIDCO-1312 - Getting issue details... STATUS
  • The number of jobs limited. BIIDCO-289 - Getting issue details... STATUS
  • All the upload trials are failing against all the SEs configured: OutputSE (KMI-TMP-SE, PNNL-TMP-SE), Fail-over SEs(DESY-TMP-SE, Napoli-TMP-SE, PNNL-TMP-SE, KIT-TMP-SE)
  • Large % of failed jobs in DIRAC status plot (Added 2016-11-03 22:45:00 UTC) 

DIRAC.BINP.ru

  • Job status check: Application finished with errors (27% of the jobs over the last 24h) at 8:00 UTC on 2018/12/22.
  • Job submission check : Pilot submission failure has been found since 17:26:00 UTC on 2018/10/21. BIIDCO-1387 - Getting issue details... STATUS
  • Health checker info. : "Failed to install DIRAC on " has been found at 22:20:00 UTC on 2018/09/15

DIRAC.BINP-VM.ru

  • Health checker info. : "Aborted pilot jobs" has been found at 06:20:00 UTC on 2019/02/21
  • Job submission check : Pilot submission failure has been found since 10:23:00 UTC on 2019/01/14. BIIDCO-1607 - Getting issue details... STATUS
  • Job status plots, "Application Finished With Errors" (2018-02-11 but lasting for at least a month) BIIDCO-749 - Getting issue details... STATUS

DIRAC.CINVESTAV.mx

  • Job submission check : Pilot submission failure has been found at 13:27:00 UTC on 2019/03/19.
  • Health checker info.: "Short pilot jobs" has been found at 15:20:00 UTC on 2019/03/08 Ticket updated BIIDCO-1524 - Getting issue details... STATUS
  • Health checker info. : "Short pilot jobs" has been found at 15:20:00 UTC on 2018/12/06. 
    BIIDCO-1524 - Getting issue details... STATUS
  • Job status plots, "Application Finished With Errors" & "Watchdog identified this job as Stalled" (2018-02-12) BIIDCO-755 - Getting issue details... STATUS

DIRAC.DESY.de

  • Test site. Not in use in MC production

DIRAC.IITG.in

  • Health checker info. : "Short pilot jobs" has been found since 09:20:00 UTC on 2019/03/20
  • Health checker info. : "Short pilot jobs" has been found since 14:20:00 UTC on 2019/02/21. BIIDCO-1686 - Getting issue details... STATUS
  • Job status check: Application finished with errors (95% of the jobs over the last 24h) at 8:00 UTC on 2018/12/22.
  • Job status check: Application finished with errors (100% of the jobs) since 6:00 UTC on 2018/12/21.
  • Job status check: Input Data Resolution issues (100% of the jobs) on 2018/12/21 at 8:48 UTC.
  • all jobs are in "Input data resolution" status (2018/12/20).
  • Job submission check : Pilot submission failure has been found at 22:23:00 UTC on 2018/12/05. BIIDCO-1474 - Getting issue details... STATUS
  • Health checker info. : "Aborted pilot jobs" has been found since 14:20:00 UTC on 2018/04/22  BIIDCO-977 - Getting issue details... STATUS

DIRAC.IITH.in

  • Health checker info. : "Short pilot jobs" has been found since 08:20:00 UTC on 2019/01/03.  BIIDCO-1593 - Getting issue details... STATUS
  • Job status check: "input Data Resolution" issues (36%) on 2018/10/26. BIIDCO-1378 - Getting issue details... STATUS
  • Job submission check : Pilot submission failure has been found since 11:28:00 UTC on 2018/10/03.  BIIDCO-1349 - Getting issue details... STATUS

DIRAC.LMU.de

  • Not in use in MC production BIIDCO-26 - Getting issue details... STATUS
  • Banned for now.

DIRAC.MIPT.ru

  • Health checker info. : "Short pilot jobs" has been found at 14:20:00 UTC on 2019/01/25.
  • Health checker info. : "Belle II software could not be installed on " has been found since  21:20:00 UTC on 2018/11/21 to 08:20:00 UTC on 2018/12/14.
  • Health checker info. : "Belle II software could not be installed on " has been found at 14:20:00 UTC on 2018/11/20. BIIDCO-1471 - Getting issue details... STATUS
  • Health checker info. : "Short pilot jobs" has been found since 21:20:00 UTC on 2018/11/23.(details
  • Health checker info. : "Short pilot jobs" has been found at 20:20:00 UTC on 2018/09/13.(details)
  • Health checker info. : "Short pilot jobs" has been found at 15:20:00 UTC on 2018/07/23.(details)
  • Health checker info. : "Short pilot jobs" has been found at 04:20:00 UTC on 2018/07/19.
  • Job status plots, "Application Finished With Errors" has been found at about 04:00:00 JST on 2018/07/06. (details)
  • Health checker info. : "Aborted pilot jobs" has been found at 12:20:00 UTC on 2018/02/11 BIIDCO-747 - Getting issue details... STATUS

DIRAC.Nagoya.jp

  • Health checker info. : "Failed to install DIRAC on " has been found since 00:20:00 UTC on 2019/03/13. BIIDCO-1721 - Getting issue details... STATUS

  • Health checker info. : "Short pilot jobs" found intermittently from 2018/08/17-2019/01/27 BIIDCO-1227 - Getting issue details... STATUS

DIRAC.Nara-WU.jp

  • Under commisioning from 2018-11-13 BIIDCO-1432 - Getting issue details... STATUS

DIRAC.NDU.jp

  • date, issue, ticket

DIRAC.Niigata.jp

  • Health checker info. : "Short pilot jobs" has been found since 01:20:00 UTC on 2019/01/31.(details) BIIDCO-1646 - Getting issue details... STATUS

DIRAC.Osaka-CU.jp

  • Job submission check : Pilot submission failure has been found since 07:23:00 UTC on 2018/12/04.
  • Job submission check : Pilot submission failure has been found since 07:23:00 UTC on 2018/12/04. 
    BIIDCO-1434 - Getting issue details... STATUS
  • Job submission check : Pilot submission failure has been found since 07:23:00 UTC on 2018/12/04.  BIIDCO-1434 - Getting issue details... STATUS
  • Pilot submission failure has been found since 18:32:00 UTC on 2018/11/24  BIIDCO-1434 - Getting issue details... STATUS
  • Health checker info. : "Short pilot jobs" has been found since 22:20:00 UTC on 2018/03/17.
    → Ask site admin to check the status 2018-03-17 10:00 JST. (DB access failure again from DIRAC.Osaka-CU.jp to PNNL from 2018-03-16 11:00 UTC)
    BIIDCO-290 - Getting issue details... STATUS

DIRAC.PNNL.us

  • Site to be decommissioned BIIDCO-919 - Getting issue details... STATUS

DIRAC.PNNL2.us

  • Site to be decommissioned BIIDCO-920 - Getting issue details... STATUS

DIRAC.PNNL-CASCADE.us

  • Seeing no jobs (no plot) is not a problem

DIRAC.PNNL-PIC.us

  • Seeing no jobs (no plot) is not a problem

DIRAC.RCNP.jp

  • date (start writing with '//'), issue, ticket...

DIRAC.SSU.kr

  • date (start writing with '//'), issue, ticket...

DIRAC.TIFR.in

  • Job submission check : Pilot submission failure has been found at 13:27:00 UTC on 2019/03/19.
  • Job submission check : Pilot submission failure has been found since 21:27:00 UTC on 2019/03/07. Ticket updated  BIIDCO-1590 - Getting issue details... STATUS
  • Job submission check : Pilot submission failure has been found since 07:22:00 UTC on 2019/01/01.  BIIDCO-1590 - Getting issue details... STATUS
  • Health checker info. : "Short pilot jobs" has been found since 13:20:00 UTC on 2018/10/22.
  • Job status plots, "Application Finished With Errors" has been found at about 00:00:00 JST on 2018/07/06. (details) BIIDCO-1132 - Getting issue details... STATUS
  • Health checker info. : "Short pilot jobs" -- Already reported:  BIIDCO-971 - Getting issue details... STATUS
  •  RunningLimit is set for MCProduction=1 BIIDCO-1006 - Getting issue details... STATUS
  • Job stalled at input data resolution BIIDCO-714 - Getting issue details... STATUS

DIRAC.TMU.jp

  • Job submission check : Pilot submission failure has been found at 13:27:00 UTC on 2019/03/19.
  • Job submission check : Pilot submission failure has been found since 07:22:00 UTC on 2019/02/18. BIIDCO-1373 - Getting issue details... STATUS
  • Health checker info. : "Short pilot jobs" has been found since 01:20:00 UTC on 2019/01/31.(details) BIIDCO-1646 - Getting issue details... STATUS
  • Health checker info. : "Short pilot jobs" has been found since 10:20:00 UTC on 2018/11/02 BIIDCO-1409 - Getting issue details... STATUS BIIDCO-1522 - Getting issue details... STATUS
  • Job status check: "Application finished with errors" (60%) on 2018/10/26.
  • Health checker info. : "Belle II software could not be installed on " has been found since 18:20:00 UTC on 2018/10/17.
  • Health checker info. : "Short pilot jobs" has been found since 01:20:00 UTC on 2018/10/15.  BIIDCO-1373 - Getting issue details... STATUS
  • Job submission check : Pilot submission failure has been found since 10:24:00 UTC on 2018/09/30. (details)

DIRAC.Tokyo.jp

  • Date, Issue, Tickets..

DIRAC.UAS.mx

  • Health checker info. : "Belle II software could not be installed on " has been found since 01:20:00 UTC on 2019/02/20.
  • Job submission check: 100% failed with errors from 22:00 2019/01/08 till 04:00 2019/01/09 (UTC)
  • Health checker info. : "Belle II software could not be installed on " has been found since 04:20:00 UTC on 2018/12/17.  BIIDCO-1508 - Getting issue details... STATUS
  • Health checker info. : "Belle II software could not be installed on " has been found since 16:20:00 UTC on 2018/11/14.
  • Job submission check : Pilot submission failure has been found since 01:26:00 UTC on 2018/09/21. (details)
  • Health checker info. : "Short pilot jobs" has been found since 17:20:00 UTC on 2018/09/17 Added to BIIDCO-1286 - Getting issue details... STATUS
  • Job submission check : Pilot submission failure has been found since 15:24:00 UTC on 2018/09/16 (emailed comp-dc-operations, create JIRA ticket when able)

DIRAC.UVic.ca

  • Health checker info. : "Short pilot jobs" has been found at 23:20:00 UTC on 2019/01/28 and 2019/01/29.(details)
  • Health checker info. : "Short pilot jobs" has been found since 20:20:00 UTC on 2018/08/16.(details) and 2018/10/07.(details)

DIRAC.UVic-local.ca

  • Health checker info. : "Short pilot jobs" has been found at 14:20:00 UTC on 2019/01/28  (details) and  2019/03/02.

DIRAC.Yamagata.jp

  • Health checker info. : "Short pilot jobs" has been found at 22:20:00 UTC on 2019/03/13.(details) BIIDCO-1646 - Getting issue details... STATUS
  • high ratio of jobs finished with error (from job status) 2019/01/29 Tue 03:00 UTC 
  • Job status check: Application finished with errors (13% of the jobs at 11:15 UTC, but 100% in the last hours) on 2018/12/21.
  • Health checker info. : "Short pilot jobs" has been found since 04:20:00 UTC on 2018/12/12.(details)
  • Health checker info. : "Short pilot jobs" has been found since 15:20:00 UTC on 2018/05/21.(details)

DIRAC.Yonsei.kr 

  • Health checker info. : "Short pilot jobs" has been found at 14:20:00 UTC on 2018/12/28.
  • Health checker info. : "Short pilot jobs" has been found at 06:20:00 UTC on 2018/12/08. 
    BIIDCO-1416 - Getting issue details... STATUS

DIRAC.LocalTest.jp

LCG.CESNET.cz

LCG.CNAF.it

  • Health checker info. : "Short pilot jobs" has been found since 09:20:00 UTC on 2019/03/20
  • Health checker info. : "Aborted pilot jobs" has been found since 04:20:00 UTC on 2018/12/13 BIIDCO-1488 - Getting issue details... STATUS
  • Short pilot jobs" has been found since 21:20:00 UTC on 2018/11/21 BIIDCO-1455 - Getting issue details... STATUS

LCG.Cosenza.it

  • Job status check: "application finished with errors" (100% currently) on 2019/02/19.

LCG.CYFRONET.pl

  • Health checker info. : "Short pilot jobs" has been found since 13:20:00 UTC on 2018/12/13. BIIDCO-1246 - Getting issue details... STATUS

LCG.DESY.de

  • The site to be retired  BIIDCO-1240 - Getting issue details... STATUS  – No more jobs to be submitted.

LCG.Frascati.it

  • "BLAH ERROR" has been found since 21:20:00 UTC on 2019/03/15.  BIIDCO-1712 - Getting issue details... STATUS
    GGUS ticket 140111 has submitted 2019-03-08
  • "Short pilot jobs" has been found at 14:20:00 UTC on 2019/03/17.
  • Downtime LCG.Frascati.it : Start time: 2019-03-11 11:00 (UTC) End time: 2019-03-12 14:00 (UTC) BIIDCO-1718 - Getting issue details... STATUS
  • Health checker info. : "Short pilot jobs" has been found at 23:20:00 UTC on 2019/03/04 and since 13:20:00 UTC on 2019/03/07
  • high ratio of jobs finished with error (from job status)2019/01/29 Tue 03:00 UTC 
  • Health checker info. : "Failed pilot jobs" has been found at 14:20:00 UTC on 2018/10/21.
  • Health checker info. : "Short pilot jobs" has been found since 12:20:00 UTC on 2018/07/10.(details BIIDCO-1153 - Getting issue details... STATUS
  • Health checker info. : "Short pilot jobs" has been found since 06:20:00 UTC on 2018/06/30.(details)

LCG.HEPHY.at

  • Health checker info. : "Failed pilot jobs" has been found at 02:20:00 UTC on 2019/01/30.(details) and at 02:20:00 UTC on 2019/01/31.(details)
  • submission check : Pilot submission failure has been found at 14:22:00 UTC on 2018/12/27.
  • Health checker info. : Short pilot jobs has been found since 21:20:00 UTC on 2018/12/12 BIIDCO-1532 - Getting issue details... STATUS
  • Health checker info. : "BLAH ERROR" has been found since 13:20:00 UTC on 2018/10/09.(details)
  • Job submission check : Pilot submission failure has been found since 16:25:00 UTC on 2018/06/21. (details) BIIDCO-1107 - Getting issue details... STATUS

LCG.IPHC.fr

  • Health checker info. : "Failed pilot jobs" has been found at 00:20:00 UTC on 2018/06/18.(details)

LCG.KEK.jp

  • LCG.KEK.jp : Downtime Start time: 2019-03-07 00:30 (UTC) End time: 2019-03-07 01:00 (UTC)  BIIDCO-1706 - Getting issue details... STATUS
  • Job submission check : Pilot submission failure has been found since 05:25:00 UTC on 2018/12/20. BIIDCO-1548 - Getting issue details... STATUS
  • Health checker info. : "Short pilot jobs" has been found since 17:20:00 UTC on 2018/11/12.(details BIIDCO-1431 - Getting issue details... STATUS

  • Performance degraded with "Input data resolution" status since 2018-07-24 around 20:00 UTC BIIDCO-1191 - Getting issue details... STATUS

LCG.KEK2.jp

  • Health checker info. : "Short pilot jobs" has been found at 14:20:00 UTC on 2019/03/09. Ticket updated  BIIDCO-1450 - Getting issue details... STATUS BIIDCO-1646 - Getting issue details... STATUS
  • Health checker info. : "Failed pilot jobs" has been found since 13:20:00 UTC on 2018/12/21. BIIDCO-1559 - Getting issue details... STATUS
  • Job submission check : Pilot submission failure has been found at 06:25:00 UTC on 2018/12/20. BIIDCO-1548 - Getting issue details... STATUS
  • all jobs are in "Input data resolution" status since 12.00 2018/12/18 UTC BIIDCO-1542 - Getting issue details... STATUS

LCG.KEK-merge.jp

  •   being commissioned...

LCG.KISTI.kr

  • Jobs slots are disabled for SE maintenace from 2018-10-19 to 2018-10-23 BIIDCO-1380 - Getting issue details... STATUS
  • Health checker info. : "BLAH ERROR" has been found since 06:20:00 UTC on 2018/10/19.(details)

  • "Short pilot jobs" has been found at 06:20:00 UTC on 2018/10/09.(details)
  • BLAH error seems to be happen if jobs exceed the allocated # of queues, not a problem (Site specific feature)  
    BIIDCO-1259 - Getting issue details... STATUS
  • A large number of Merge jobs in waiting status BIIDCO-773 - Getting issue details... STATUS

LCG.KMI.jp

  • Job submission check : Pilot submission failure has been found since 21:25:00 UTC on 2019/02/01. 

  • Health checker info. : "Short pilot jobs" has been found since 22:20:00 UTC on 2019/01/27.(details)
  • Job status check: Application finished with errors (7% of the jobs in last 24 hours) on 2018/12/21 at 8:48 UTC.
  • Health checker info. : "Short pilot jobs" has been found at 14:20:00 UTC on 2018/12/13.(details) BIIDCO-1533 - Getting issue details... STATUS
  • Health checker info. : "Belle II software could not be installed on pwn22.local" has been found since 21:20:00 UTC on 2018/11/22.
  • Health checker info. : "Belle II software could not be installed on pwn22.local" has been found since 05:20:00 UTC on 2018/11/22.
  • Job submission check : Pilot submission failure has been found since 21:24:00 UTC on 2018/10/02. (details)

LCG.LAL.fr

  • Downtime:  Start downtime: 2019-03-12 09:00 (UTC) End downtime: 2019-03-14 17:00 (UTC) BIIDCO-1719 - Getting issue details... STATUS
  • Downtime: Start downtime: 2019-02-25 08:00 – End downtime: 2019-02-26 08:00 BIIDCO-1687 - Getting issue details... STATUS
  • Downtime: Start downtime: 2019-02-21 00:00 – End downtime: 2019-02-22 10:00 BIIDCO-1687 - Getting issue details... STATUS

Site under commissioning. Issues to be reported.

LCG.Legnaro.it

  • Health checker info. : "Failed pilot jobs" has been found since 13:20:00 UTC on 2018/03/16.(details)

LCG.Napoli.it

  • Stalled jobs BIIDCO-1255 - Getting issue details... STATUS

LCG.NTU.tw

  • Job submission check : Pilot submission failure has been found since 15:22:00 UTC on 2019/03/17.  BIIDCO-1730 - Getting issue details... STATUS
  • GGUS ticket 140252 has submitted 2019-03-18
  • Health checker info. : "Short pilot jobs" has been found at 14:20:00 UTC on 2019/03/17.
  • Solved and verified 2019-03-15 : GGUS ticket : "CRL expiration at belle2grid2.cc.ntu.edu.tw"(139674) has been submited at 22:05:29 UTC on 2019/02/13.
  • Health checker info. : "CRL has expired" has been found since 21:20:00 UTC on 2019/02/11.
  • Health checker info. : "CRL has expired" has been found since 11:20:00 UTC on 2019/01/14. BIIDCO-1430 - Getting issue details... STATUS
  • Job submission check : Pilot submission failure has been found since 06:22:00 UTC on 2019/01/11.  BIIDCO-1602 - Getting issue details... STATUS
    Solved and verified 2019-02-13 : GGUS ticket 139598 : "Job submission failure at belle2grid2.cc.ntu.edu.tw"

LCG.Pisa.it

  • Job submission check : Pilot submission failure has been found since 20:23:00 UTC on 2019/03/16.
  • Job submission check : Pilot submission failure has been found since 01:22:00 UTC on 2019/03/14 BIIDCO-1723 - Getting issue details... STATUS
  • GGUS ticket : "INFN-PISA: All CEs - LSF directory doesn't exists"(139815) has been submited at 00:03:38 UTC on 2019/02/21. Link to GGUS ticket.
  • "Failed pilot jobs" has been found since 14:20:00 UTC on 2019/01/18 , last for more than 5 horus.  BIIDCO-1619 - Getting issue details... STATUS
  • "Failed to install DIRAC on so1wn8.pi.infn.it,n2wn13.pi.infn.it,n2wn18.pi.infn.it,so1wn6" has been found since 01:20:00 UTC on 2019/01/02. BIIDCO-1591 - Getting issue details... STATUS
  • "Short pilot jobs" has been found since 02:20:00 UTC on 2018/09/21.(details) BIIDCO-1157 - Getting issue details... STATUS

LCG.Roma3.it

  • Health checker info. : "Failed pilot jobs" has been found since 13:20:00 UTC on 2019/03/09. Ticket Updated  BIIDCO-1538 - Getting issue details... STATUS
  • Health checker info. : "Failed pilot jobs" has been found since 05:20:00 UTC on 2019/02/21. Ticket Updated BIIDCO-1538 - Getting issue details... STATUS
  • Health checker info. : "Failed pilot jobs" has been found since 03:20:00 UTC on 2019/01/22 GGUS-139270
  • Job status check: Application finished with errors (25% of the jobs in last 24 hours) and Stalled (36%) on 2018/12/22 at 8:00 UTC.
  • Health checker info. : "Failed pilot jobs" has been found since 13:20:00 UTC on 2018/12/21.
  • Job status check: Application finished with errors (10% of the jobs in last 24 hours) and Stalled (76%) on 2018/12/21 at 8:48 UTC.
  • Health checker info. : "Failed pilot jobs" has been found at 07:20:00 UTC on 2018/12/21.(details)
  • Stalled jobs on 2018/12/20.
  • Health checker info. : "Failed pilot jobs" has been found at 06:20:00 UTC on 2018/12/08. BIIDCO-1538 - Getting issue details... STATUS

  • Health checker info. : "Aborted pilot jobs" has been found at 16:20:00 UTC on 2018/10/09.(details)
  • Roma3 commissioning BIIDCO-111 - Getting issue details... STATUS   (NOTE: This ticket seems obsolete, it should be closed and removed from operation status)

LCG.TAU.il

  • Health checker info. : "Short pilot jobs" has been found at 22:20:00 UTC on 2018/11/01.
  • Health checker info. : "Failed pilot jobs" has been found at 18:20:00 UTC on 2018/09/14.(details)

LCG.Torino.it

  • Health checker info. : "Failed pilot jobs" has been found since 13:20:00 UTC on 2018/11/04 and since 13:20:00 UTC on 2018/11/11.
  • Job submission check : Pilot submission failure has been found at 22:22:00 UTC on 2018/11/04.
  • Health checker info. : "Failed pilot jobs" has been found since 20:20:00 UTC on 2018/11/01.  BIIDCO-1417 - Getting issue details... STATUS

LCG.ULAKBIM.tr

  • Health checker info. : "BLAH ERROR" has been found since 07:20:00 UTC on 2019/03/20.
  • Health checker info. : "BLAH ERROR" has been found since 06:20:00 UTC on 2019/03/18.
  • Health checker info. : "BLAH ERROR" has been found at 14:20:00 UTC on 2019/03/17.
  • Health checker info. : "BLAH ERROR" has been found since 03:20:00 UTC on 2019/03/17
  • Health checker info. : "BLAH ERROR" has been found since 12:20:00 UTC on 2019/03/16 BIIDCO-1727 - Getting issue details... STATUS

OSG.BNL.us

  • Health checker info. : "Belle II software could not be installed on " has been found since 19:20:00 UTC on 2019/02/14.
  • Job submission check: Jobs fail with errors or input data resolution the last 24h (6:00 UTC, 2019/01/09)  BIIDCO-1596 - Getting issue details... STATUS
  • Health checker info. : "Short pilot jobs" has been found since 21:20:00 UTC on 2019/01/02.  BIIDCO-1594 - Getting issue details... STATUS
  • Production jobs: UNAVAILABLE files BIIDCO-1302 - Getting issue details... STATUS
  • Number of concurrent MCProduction jobs restricted BIIDCO-1256 - Getting issue details... STATUS
  •  MCProduction jobs are mostly stalled BIIDCO-1253 - Getting issue details... STATUS

OSG.CORI.us

  • OSG.CORI.us resource has been removed because CY18 allocation was not approved

OSG.UMiss.us

  • date (start writing with '//'), issue, ticket...

SSH.KMI.jp

  • Job status check: Application finished with errors (12% of the jobs in last 24 hours) on 2018/12/22 at 11:30 UTC.
  • Health checker info. : "Short pilot jobs" has been found at 20:20:00 UTC on 2018/08/13.

VCYCLE.Napoli.it

  • Opportunistic site (Empty plot is not a problem)
  •  Ban lifted BIIDCO-1613 - Getting issue details... STATUS
  • "Sudo CE Error: sudo execution fails with return code 1" BIIDCO-1612 - Getting issue details... STATUS

VCYCLE.HNSC01.it, VCYCLE.HNSC02.it

  • Opportunistic site (Empty plot is not a problem)


Links


  • No labels