You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 2095 Next »

Contact: comp-dc-operations @ belle2.org


Contents

  •  Click here to expand...

    l


Production Plans

  • MC10
  • MC11
  • SKIM9x2, 40 TB
  • GCR2c
  • prod6



Production Status

 Older status updated (click to expand)

Official MC9 production started at ~21:00 JST on July 5, 2017.

Starting with BGx0 generic samples (0.2 ab-1)

Submitted second batch of BGx0 generic jobs (July 7, ~04:00 JST)

Third and fourth batches of BGx0 generic jobs (July 10)

Submitted a few BGx0 signal samples (July 12 ~04:30 JST)

Submitted the phase 2 generic samples with BGx0 (July 14 ~04:00 JST)

Submitted the rest of the BGx0 signal samples (July 16 ~00:00 JST)

New requests for BGx0 signal samples submitted (July 19 ~01:30 JST)

MC9 restarted with BGx1 phase 2 samples - 50 fb-1 generic and signal samples (July 30 ~10:00 JST)

Submitted first batch of phase 3 samples with background - mixed and charged BBbar - about 140k jobs (August 12 ~08:00 JST)

Added uubar: ~180k jobs (August 13 ~10:30 JST)

Added ddbar: ~53k jobs (August 29 ~22:00 JST)

Added ssbar: ~51k jobs (Sept. 2 ~11:00 JST)

Submitted phase 3 low-multiplicity samples: ~43.5k jobs (Sept. 2 ~13:00 JST) → includes generator level skim so number of jobs is inflated compared to run time

Added ccbar and taupair: ~317k jobs (Sept 3 ~09:30 JST)

Submitted new signal MC samples: ~57.6k jobs (Sept 11 ~23:00 JST)

Submitted new phase 2 signal MC samples: ~21.2k jobs (Sept 12 ~05:00 JST) → short jobs < 3 hrs each

Submitted new phase 3 signal MC samples: ~600k jobs (Sept 24 ~22:30)

Submitted new phase 3 signal MC samples (almost all submitted now) (Sept 28 ~02:00 JST)

Submitted phase 3 Y(5S) bsbs and non-bsbs samples: ~54.4k jobs (Oct 6 ~04:30 JST)

Submitted phase 3 Y(5S) uubar samples: ~242k jobs (Oct 9 ~09:30 JST)

Submitted phase 3 Y(5S) ddbar samples: ~60k jobs (Oct 16 ~10:30 JST)

Submitted phase 3 Y(5S) ssbar and ccbar samples: ~300k jobs (Oct 18 ~02:00 JST)

Submitted a few last signal samples and the phase 3 Y(5S) taupair samples: ~200k jobs (Oct 24 ~03:00 JST)
     → The taupair samples should run as shorter jobs (~5-6 hours at KEKCC)

Submitted Y(6S) continuum samples: ~30k jobs (Oct 30 ~23:00 JST)

Submitted Y(3S) generic samples: ~260k 8h jobs (Nov 3 ~04:30 JST)

Submitted Y(3S) continuum samples (uubar): ~170k 5h jobs (Nov 5 ~08:00 JST)

Submitted Y(3S) continuum samples (ddbar, ssbar, ccbar): ~70k 5h jobs + ~80k 8h jobs (Nov 13 ~23:00 JST)

Submitted Y(3S) taupair samples: ~70k 5h jobs (Nov 17 ~01:00 JST)

Submitted remaining Y(5S) generic: ~4.5k jobs (Nov 20 ~00:00 JST)

Submitted next batch of Y(4S) generic (mixed, charged): ~100k 8h jobs, ~150k 5h jobs (Nov 20 ~03:00 JST)

Submitted Y(4S) uubar continuum: ~250k 8h jobs (Nov 25 ~22:00 JST)

Submitted Y(4S) ddbar continuum: ~100k 5h jobs (Nov 28 ~21:00 JST)

Submitted Y(4S) ssbar continuum: ~100k 5h jobs (Dec 6 ~21:00 JST)

Submitted Y(4S) ccbar continuum: ~200k 8h jobs (Dec 9 ~05:00 JST)

Submitted Y(4S) taupair: ~180k 5h jobs (Jan 2 ~08:30 JST)

Submitted Y(4S) mixed sample (batch 3): ~100k 8h jobs (Jan 8 ~23:00 JST)

Submitted a few low multiplicity samples (Jan 18)

Submitted Y(4S) bbbar samples for data challenge (phase 3, BGx1): ~220k 9hr jobs (Jan 18 ~02:00 JST)

Submitted Y(4S) ddbar samples for data challenge (phase 3, BGx1): ~130k 5 hr jobs (Jan 22 ~22:00 JST)

Submitted phase 3 BG overlay production scripts (Jan 27 ~00:50 JST)

Submitted Y(4S) uubar samples for data challenge (phase 3, BGx1): ~290k 9 hr jobs (Feb 2 ~00:30 JST)

Submitted Y(4S) ssbar samples for data challenge (phase 3, BGx1): ~95k 6 hr jobs (Feb 3 ~21:45 JST)

Submitted Y(4S) ccbar samples for data challenge (phase 3, BGx1): ~320k 7.5 hr jobs (Feb 13 ~10:30 JST)

Submitted Y(4S) taupair samples for data challenge (phase 3, BGx1): ~160k 8 hr jobs (Feb 19 ~23:00 JST)

Submitted Y(4S) generic charged (not for data challenge): ~150k 5h jobs (Mar 1 ~23:00 JST)

Submitted Y(4S) uubar samples (phase 3, BGx1): ~250k 8h jobs (Mar 2 ~21:00 JST)

Submitted phase 3, BGx1 low multiplicity samples as requested by the bottomonium group (Mar 8 ~00:00 JST)

Submitted MC10 analysis validation samples, both BGx0 and BGx1 (Mar 9, 01:40 JST)

More MC9 skim jobs approved (Mar 12, ~23:00 JST)

Submitted Y(4S) ddbar samples (phase 3, BGx1): ~100k 5h jobs (Mar 13 ~00:00 JST)

Submitted Y(4S) ssbar samples (phase 3, BGx1): ~95k 5h jobs (Apr 3 ~04:00 JST)

Submitted Y(4S) ccbar samples (phase 3, BGx1): ~200k 8h jobs (Apr 4 ~02:30 JST)

Submitted Y(4S) taupair samples (phase 3, BGx1): ~180k 5h jobs (Apr 5 ~01:30 JST)

All scheduled MC9 fabrication jobs have finished (as of April 16, 2018)

First official batch of MC10 jobs submitted - phase 3 Y(4S) mixed samples: BGx1 ~100k 8h jobs and BGx0 ~36k 5h jobs (April 14 ~00:00 JST)

MC10 - phase 3 Y(4S) charged samples: BGx1 ~110k 8h jobs and BGx0 ~38k 5h jobs (April 17 ~01:00 JST)

First (small) batch of MC10 signal jobs submitted (April 20 ~03:30 JST)

MC10 - phase 3 Y(4S) uubar samples: BGx1 ~180k 8h jobs and BGx0 ~64k 5h jobs (April 23 ~05:45 JST)

Second batch of MC10 signal samples submitted (April 23 ~23:00 JST)

Third batch of MC10 signal samples submitted (April 27)

Submitted Phase 2 Y(4S) generic samples (May 1 ~23:30 JST)

Fourth batch of MC10 signal samples (mostly non-4S and/or phase 2) (May 2 ~02:00 JST)

MC10 - phase 3 Y(4S) ddbar samples: BGx1 ~45k 8h jobs and BGx0 ~15k 5h jobs (May 5 ~03:00 JST)

MC10 - phase 3 Y(4S) ssbar samples: BGx1 ~45k 8h jobs and BGx0 ~15k 5h jobs (May 9 ~02:00 JST)

Fifth batch of MC10 signal samples (mostly non-4S and/or phase 2) (May 9 ~03:30 JST)

MC10 - phase 3 Y(4S) ccbar samples: BGx1 ~200k 8h jobs and BGx0 ~50k 5h jobs (May 9 ~02:00 JST)

Submitted a large batch of phase 2 signal samples

MC10 - phase 3 Y(4S) taupair samples: BGx1 ~150k 8h jobs and BGx0 ~37k 5h jobs (May 17 ~02:00 JST)

Small signal MC batch (May 17 ~02:00 JST)

Submitted first batch of BGx2 productions: mixed phase 3 ~75k jobs and phase 2 ~2k jobs (May 25 ~20:30 JST)

Submitted many BGx1 signal MC samples (May 26 ~23:00 JST)

Submitted additional BGx1 signal MC samples (May 27 ~22:00 JST)

Submitted first small batch of BGx2 signal samples (June 9 ~00:00 JST)

MC10 - phase 3 Y(4S) uubar samples: BGx1 ~180k jobs (June 9 ~00:00 JST)

Submitted second batch of BGx2 signal samples (June 9 ~15:00 JST)

Submitted DR3 samples (~10 fb-1): 7k jobs (June 9 ~16:00 JST)

MC10 - phase 3 Y(4S) ddbar samples: BGx1 ~45k jobs (June 10 ~11:00 JST)

Submitted BGx2 signal samples (June 12 starting around ~23:00 JST - will stagger the approval to avoid over-stressing the system)

Submitted ~100 fb-1 phase 3 dress rehearsal jobs (June 15)

MC10 - phase 3 Y(4S) ssbar samples: BGx1 ~45k 8h jobs (June 15 ~3:30 JST)

MC10 - phase 3 Y(4S) ccbar samples: BGx1 ~200k 8h jobs (June 17 ~8:00 JST)

MC10 - phase 3 Y(4S) taupair samples: BGx1 ~150k 8h jobs (June 17 ~8:00 JST)

Submitted a few additional MC10 signal samples before upgrade (July 24 ~22:00 JST)

Submitted a few small MC10 productions (Aug 14 ~23:30 JST) - will submit more if these proceed smoothly

Large generic MC11 productions (release-02-00-01) productions submitted (Aug 16 - current)

Few MC10 signal productions (Sept 13 ~00:00 JST)

Submitting MC11 BGx0 generic samples (Sept 23 ~ 02:00 JST)

Some MC10 signal productions (Sept 26 ~04:00 JST)

Submitting MC11 BGx0 generic samples with early phase 3 geometry (Oct 3 ~07:30 JST)

Submitting new MC11 signal samples (Oct 9 ~13:00 JST)

Submitting MC11 Y(5S) samples (Oct 9 ~21:30 JST)

Many MC11 signal samples submitted (Oct 12 ~00:00 JST)


Most productions have finished. New productions will be started when updated beam background files are ready.

Some small signal MC samples were submitted on Nov 11 ~06:00 JST, Nov 13, Nov 14.

Phase 2 reprocessing with the distributed computing system (prod6b) started Nov 15 ~00:00 JST.

Large phase 3 BGx1 generic MC samples were submitted recently (~Nov 20). These should be relatively short jobs (though there are many of them). They should run for the next few weeks.


Central Services

Dirac (dirac.cc.kek.jp, b2dchsv01-b2dchsv06.cc.kek.jp, b2dchsv08.cc.kek.jp)

  • Date, Issue, Tickets...
  • The memory has rapidly increase at b2dchsv04.  BIIDCO-1545 - Getting issue details... STATUS
  • Network downtime BIIDCO-1395 - Getting issue details... STATUS
  • 1-min cpu load has rapidly increased and gone over the redline. BIIDCO-1487 - Getting issue details... STATUS

DB Production (b2dchdb1.cc.kek.jp, b2dchdb2.cc.kek.jp, b2dcsdb1.cc.kek.jp, b2dcsdb2.cc.kek.jp


DDM (bldirac01.sdcc.bnl.gov)

  •   BNL network interruption 2018-Dec-18 14:00-15:00 UTC BIIDCO-1462 - Getting issue details... STATUS
  • DDM ReplicateAndRegister increasing Queued tasks since 2018-07-17 BIIDCO-1175 - Getting issue details... STATUS
  • DDM is stalled BIIDCO-1140 - Getting issue details... STATUS
  • 2018-03-01 DDM deletion task seems stuck BIIDCO-808 - Getting issue details... STATUS

Conditions DB ()

  •   BNL network interruption 2018-Dec-18 14:00-15:00 UTC BIIDCO-1462 - Getting issue details... STATUS
  • Planning to migrate to BNL servers on May 31th. Following IP address will be used.

    192.33.128.4
    192.33.128.5 
    192.33.128.6
    192.33.128.9
    192.33.128.10
    192.33.128.11

Monitor

  • Issue in access to DIRAC Web Portal BIIDCO-1247 - Getting issue details... STATUS

LFC

File Transfers and Replication Status

See also DDM for related issues

FTS

Any problem in the FTS service or FTS monitoring are to be recorded here. Site/SE specific issues are to be recorded under each SIte/SE

Note that the FTS dashboard we use is an "old" instance and not well-maintained. We, Belle II members in general, do not have access to the "new" monitoring. When the dashboard is down, the shifters just need to notify the expert and skip the corresponding part of their work. The expert should check the new monitoring, for the access to the monitoring page is limited.

Replication Status

  •   BNL network interruption 2018-Dec-18 14:00-15:00 UTC BIIDCO-1462 - Getting issue details... STATUS
  • Replication and DDM plots are not updated since 2018-11-25 7:00 UTC BIIDCO-1460 - Getting issue details... STATUS
    related to BIIDCO-1458 - Getting issue details... STATUS
  • 2018-10-11 Sharp drop in 'done' jobs and increase in 'waiting'  BIIDCO-1365 - Getting issue details... STATUS
  • 2018-09-29  The numbers have been almost zero.  BIIDCO-1339 - Getting issue details... STATUS
  • 2018-09-22  The number of "Done Jobs" is lower than the number of "Scheduled Jobs" during the last 6 hours or more
  • 2018-07-02   No Donetransfer,  several scheduled and rapid increase of Waiting replication BIIDCO-1125 - Getting issue details... STATUS

Job Status Plot


Job Summary


SEs

SE Common Issues

  • Issues with individual SEs should be recorded below (Primary SEs or Other SEs).

Primary SEs

Primary SE: BNL-TMP-SE (dcblsrm.sdcc.bnl.gov)

  •   BNL network interruption 2018-Dec-18 14:00-15:00 UTC BIIDCO-1462 - Getting issue details... STATUS
  •  SRM_AUTHORIZATION_FAILURE for users BIIDCO-1303 - Getting issue details... STATUS
  •  UNAVAILABLE files BIIDCO-1302 - Getting issue details... STATUS

Primary SE: CESNET-TMP-SE (dpm1.egee.cesnet.cz) 

  • Free disk space is less than 1TB in CESNET BIIDCO-1414 - Getting issue details... STATUS

Primary SE: CNAF-TMP-SE (storm-fe-archive.cr.cnaf.infn.it)

  • Replication status: Increasing 'Scheduled' with zero 'done' since 2018-12-05 13:00 UTC. BIIDCO-1473 - Getting issue details... STATUS
  •  Cotinuous timeout failure between NTU-CC-TMP-SE and CNAF-TMP-SE BIIDCO-1310 - Getting issue details... STATUS

Primary SE: DESY-TMP-SE (dcache-se-desy.desy.de)  

  • SE Health check by DDM : download, upload do not work since 2018-12-15 21:46:02 UTC https://agira.desy.de/browse/BIIDCO-1490
  • SE Health check by DDM : download, upload do not work since 2018-12-16 07:08:04 UTC.
  • Date, Issue, Tickets...

Primary SE:KEK2-TMP-SE (kek2-se03.cc.kek.jp)

Primary SE: KISTI-TMP-SE (belle-se-head.sdfarm.kr)

Primary SE: KIT-TMP-SE (dcachesrm-kit.gridka.de)

  • KIT SE giving occasional timeouts  BIIDCO-428 - Getting issue details... STATUS

Primary SE: KMI-TMP-SE (nsrmfe01.hepl.phys.nagoya-u.ac.jp )

  • Replication status: Zero 'done'  with non-zero 'queued' 2018-8-24 06:30 UTC BIIDCO-1233 - Getting issue details... STATUS

Primary SE: Napoli-TMP-SE (belle-dpm-01.na.infn.it )

  • SE Health check by DDM : upload does not work since 2018-11-14 19:37:18 UTC. BIIDCO-1435 - Getting issue details... STATUS
  • SE Health check by DDM : checksum, download, upload do not work since 2018-09-20 05:24:11 UTC BIIDCO-1306 - Getting issue details... STATUS
  • Disk is full BIIDCO-858 - Getting issue details... STATUS

Primary SE: SIGNET-TMP-SE (dcache.ijs.si

  • SE Health check by DDM : checksum, remove file, remove directory, download, upload, ls do not work since 2018-10-31 16:09:44 UTC. BIIDCO-1407 - Getting issue details... STATUS


Other SEs

Adelaide-TMP-SE (coepp-dpm-01.ersa.edu.au)

CYFRONET-TMP-SE (dpm.cyf-kr.edu.pl)

CINVESTAV-TMP-SE (jaguar-se.fis.cinvestav.mx)

  • Failed file transfers observed after 7:00 UTC on 22/11/2018, ticket updated BIIDCO-1340 - Getting issue details... STATUS
  • Low transfer efficiency observed at 21:00 UTC on 23/10/2018,  put it in the ticket  BIIDCO-1340 - Getting issue details... STATUS
  • Low transfer efficiency is observed again after 9:00 UTC on 10/10/18. and on  Ticket updated.  BIIDCO-1340 - Getting issue details... STATUS
  • The problem raised by https://agira.desy.de/browse/BIIDCO-1340 seems to have been solved.
  • Low transfer efficiency. BIIDCO-1340 - Getting issue details... STATUS
  • FTS authentication crediential error BIIDCO-1319 - Getting issue details... STATUS

Frascati-TMP-SE (atlasse.lnf.infn.it)

HEPHY-TMP-SE (hephyse.oeaw.ac.at)


IPHC-TMP-SE (sbgse1.in2p3.fr)

Melbourne-TMP-SE (b2se.mel.coepp.org.au)

  • transfer rate to be zero BIIDCO-896 - Getting issue details... STATUS

  • Melbourne-DATA-SE banned for write BIIDCO-927 - Getting issue details... STATUS

McGill-TMP-SE  (storm02.clumeq.mcgill.ca)

  • BIIDCO-516 - Getting issue details... STATUS McGill-TMP-SE will be decomissioned in early 2018.

MPPMU-TMP-SE (grid-srm.rzg.mpg.de)

NTU-TMP-SE, NTU-CC-TMP-SE (bgrid3.phys.ntu.edu.tw, belle2grid3.cc.ntu.edu.tw)

Pisa-TMP-SE (stormfe1.pi.infn.it)

PNNL-TMP-SE (se.hep.pnnl.gov) 

  • Being decommissioned. No need to report any issues.

Roma3-TMP-SE (storm-01.roma3.infn.it)

  •  Date, Issue, Tickets...

TAU-TMP-SE (tau-se.hep.tau.ac.il)

Torino-TMP-SE (se-srm-00.to.infn.it)

ULAKBIM-TMP-SE (torik1.ulakbim.gov.tr)

UMiss-TMP-SE (umiss005.hep.olemiss.edu)


UVic-TMP-SE(charon01.westgrid.ca)

  • File Transfer failures : File Transfer Efficiency is too low from UVic-DATA-SE. since about 2018-12-18 1:00 (UTC)  BIIDCO-1491 - Getting issue details... STATUS

  • File transfer failures from Source UVic-TMP-SE observed on 26 Oct 2018 BIIDCO-1397 - Getting issue details... STATUS
  • FTS connection timeout from Uvic to KEK (kek2-se03.cc.kek.jp) BIIDCO-1314 - Getting issue details... STATUS
    Solved and verified at 2018-10-25 : GGUS ticket https://ggus.eu/index.php?mode=ticket_info&ticket_id=137332 has submitted 2018-09-22 04:28 UTC



Sites

Sites Common Issue

  • DIRAC SSH sites does not filled jobs at MC11 BIIDCO-1231 - Getting issue details... STATUS
  • Several sites: Pilot submission failures/short pilots (4x) and Software could not be installed (2x) - Common JIRA ticket issued: BIIDCO-1443 - Getting issue details... STATUS
    → See below info for individual sites referring to the same JIRA ticket (BIIDCO-1443)
  • Several sites: Short Pilot Jobs (17x). BIIDCO-1484 - Getting issue details... STATUS

ARC.DESY.de

  • Job status check: Input Data Resolution issues (still 100% of the jobs) on 2018/12/22 at 8:00 UTC.
  • Job status check: Input Data Resolution issues (100% of the jobs) on 2018/12/21 at 8:48 UTC.
  • Health checker info. : "Short pilot jobs" has been found since 08:20:00 UTC on 2018/12/16.(details)
  • all jobs are in "Input data resolution" status since 17:00:00 UTC on 2018/12/18. BIIDCO-1541 - Getting issue details... STATUS
  • 100% Jobs fail at ARC.DESY.DE  BIIDCO-1504 - Getting issue details... STATUS BIIDCO-1486 - Getting issue details... STATUS
  • Health checker info. : "Short pilot jobs" has been found at 14:20:00 UTC on 2018/12/13.(details).  BIIDCO-1518 - Getting issue details... STATUS BIIDCO-1486 - Getting issue details... STATUS
  • Job submission check : Pilot submission failure has been found since 05:42:00 UTC on 2018/10/25. (details)
  • Health checker info. : "Aborted pilot jobs" has been found since 16:20:00 UTC on 2018/10/23. BIIDCO-1391 - Getting issue details... STATUS
  •  Reconfiguration for the site queues: BIIDCO-1392 - Getting issue details... STATUS

ARC.DESY-test.de

A test queue for the new CE. BIIDCO-1469 - Getting issue details... STATUS

  • Please report any issues.
  • Date, Issue, Ticket

ARC.KIT.de

  • Job status check: Application finished with errors (5% of the jobs in last 24 hours) on 2018/12/21 at 8:48 UTC.
  • Health checker info. : "Short pilot jobs" has been found at 07:20:00 UTC on 2018/11/19.
  • Health checker info. : "Failed pilot jobs" has been found since 20:20:00 UTC on 2018/10/20 BIIDCO-1384 - Getting issue details... STATUS
  • Health checker info. : "Short pilot jobs" has been found since 17:20:00 UTC on 2018/10/06.(details)

ARC.LMU.de

  • This is a test site. Do not need to report any issue.

ARC.LMU2.de

  • 2018/08/13 Downtime: Start time: 2018-08-12 13:00 (UTC) End time: 2018-08-13 15:00 (UTC)  BIIDCO-1212 - Getting issue details... STATUS

  • Banned as currently no resource behind the CE BIIDCO-239 - Getting issue details... STATUS

ARC.Melbourne.au


ARC.MPPMU.de

  • Health checker info. : "Failed pilot jobs" has been found at 09:20:00 UTC on 2018/10/25.(details) BIIDCO-1537 - Getting issue details... STATUS
  • Job submission check : Pilot submission failure has been found since 13:26:00 UTC on 2018/10/21. BIIDCO-1386 - Getting issue details... STATUS
  • BIIDCO-128 - Getting issue details... STATUS

ARC.SIGNET.si

  • Job status check: Application finished with errors (5% of the jobs) at 11:15 UTC on 2018/12/21.
  • Health checker info. : "Short pilot jobs" has been found since 17:20:00 UTC on 2018/12/20.(details)
  • Job submission check : Pilot submission failure has been found at 13:26:00 UTC on 2018/12/19. BIIDCO-1547 - Getting issue details... STATUS

  • "Failed to install DIRAC on " has been found since 20:20:00 UTC on 2018/11/03. BIIDCO-1420 - Getting issue details... STATUS

  • "Short pilot jobs" has been found at 14:20:00 UTC on 2018/10/29. BIIDCO-1519 - Getting issue details... STATUS
  • Health che cker info. : "Aborted pilot jobs" has been found since 20:20:00 UTC on 2018/10/20. BIIDCO-1383 - Getting issue details... STATUS
  • Health checker info. : "Failed pilot jobs" has been found since 06:20:00 UTC on 2018/10/03.(details) BIIDCO-1350 - Getting issue details... STATUS
  • Health checker info. : "Belle II software could not be installed on " has been found since 17:20:00 UTC on 2018/09/23  BIIDCO-1321 - Getting issue details... STATUS
  • Health checker info. : "Short pilot jobs" has been found at 22:20:00 UTC on 2018/09/14.(details) BIIDCO-1288 - Getting issue details... STATUS
  • Job submission check : Pilot submission failure has been found at 05:29:00 UTC on 2018/09/15. (details) BIIDCO-1289 - Getting issue details... STATUS
  • Job submission check : Pilot submission failure has been found since 01:31:00 UTC on 2018/09/08. (details) BIIDCO-1128 - Getting issue details... STATUS

CLOUD.CC1_Krakow.pl

  • Not used in production yet. Seeing no jobs (no plot) is not a problem

DIRAC.Beihang.cn

  • Health checker info. : "Short pilot jobs" has been found at 06:20:00 UTC on 2018/12/08. BIIDCO-1520 - Getting issue details... STATUS BIIDCO-1534 - Getting issue details... STATUS
  • Job status check: "application finished with errors" (100% currently) on 2018/10/26.
  • Job submission check : Pilot submission failure has been found since 09:24:00 UTC on 2018/09/21. (details) BIIDCO-1312 - Getting issue details... STATUS
  • BIIDCO-647 - Getting issue details... STATUS Many MCProduction jobs failed at file upload stage for fail-over SEs 2017-12-24
  • The number of jobs limited. BIIDCO-289 - Getting issue details... STATUS
  • All the upload trials are failing against all the SEs configured: OutputSE (KMI-TMP-SE, PNNL-TMP-SE), Fail-over SEs(DESY-TMP-SE, Napoli-TMP-SE, PNNL-TMP-SE, KIT-TMP-SE)
  • Large % of failed jobs in DIRAC status plot (Added 2016-11-03 22:45:00 UTC) 

DIRAC.BINP.ru

  • Job status check: Application finished with errors (27% of the jobs over the last 24h) at 8:00 UTC on 2018/12/22.
  • Job submission check : Pilot submission failure has been found since 17:26:00 UTC on 2018/10/21. BIIDCO-1387 - Getting issue details... STATUS
  • Health checker info. : "Failed to install DIRAC on " has been found at 22:20:00 UTC on 2018/09/15

DIRAC.BINP-VM.ru

  • Job status check: Application finished with errors (34% of the jobs in last 24 hours) and Stalled (8%) on 2018/12/21 at 8:48 UTC.
  • Job status check: "Application Finished with Errors " (episodically, 10% in total), on 2018/12/20.
  • Job Status Plots "Application Finished with Errors " (100 %) on 9/10/18.https://agira.desy.de/browse/BIIDCO-1358
  • Job status plots, "Application Finished With Errors" (2018-02-11 but lasting for at least a month) BIIDCO-749 - Getting issue details... STATUS

DIRAC.CINVESTAV.mx

  • Health checker info. : "Short pilot jobs" has been found at 15:20:00 UTC on 2018/12/06. 
    BIIDCO-1476 - Getting issue details... STATUS BIIDCO-1524 - Getting issue details... STATUS
  • Job status plots, "Application Finished With Errors" & "Watchdog identified this job as Stalled" (2018-02-12) BIIDCO-755 - Getting issue details... STATUS

DIRAC.DESY.de

  • Test site. Not in use in MC production

DIRAC.IITG.in

  • Health checker info. : "Aborted pilot jobs" has been found since 21:20:00 UTC on 2018/12/22.
  • Job status check: Application finished with errors (95% of the jobs over the last 24h) at 8:00 UTC on 2018/12/22.
  • Health checker info. : "Short pilot jobs" has been found since 03:20:00 UTC on 2018/12/22.(details)
  • Job status check: Application finished with errors (100% of the jobs) since 6:00 UTC on 2018/12/21.
  • Job status check: Input Data Resolution issues (100% of the jobs) on 2018/12/21 at 8:48 UTC.
  • Health checker info. : "Short pilot jobs" has been found at 06:20:00 UTC on 2018/12/21.(details)
  • all jobs are in "Input data resolution" status (2018/12/20).
  • 100% Jobs fail at DIRAC.IITG.in BIIDCO-1505 - Getting issue details... STATUS
  • Health checker info. : "Aborted pilot jobs" has been found at 14:20:00 UTC on 2018/12/15.
  • Health checker info. : "Short pilot jobs" has been found at 14:20:00 UTC on 2018/12/14. BIIDCO-1521 - Getting issue details... STATUS
  • Job submission check : Pilot submission failure has been found at 22:23:00 UTC on 2018/12/05. BIIDCO-1474 - Getting issue details... STATUS
  • Health checker info. : "Short pilot jobs" has been found since 17:20:00 UTC on 2018/11/02. BIIDCO-1409 - Getting issue details... STATUS
  • Job submission check : Pilot submission failure has been found at 13:24:00 UTC on 2018/09/26. (details)
  • Health checker info. : "Aborted pilot jobs" has been found since 14:20:00 UTC on 2018/04/22  BIIDCO-977 - Getting issue details... STATUS

DIRAC.IITH.in

  • Health checker info. : "Short pilot jobs" has been found at 14:20:00 UTC on 2018/12/14.
  • Job status check: "input Data Resolution" issues (36%) on 2018/10/26.
  • BIIDCO-1378 - Getting issue details... STATUS
  • Job submission check : Pilot submission failure has been found since 11:27:00 UTC on 2018/10/06. (details)
  • Job submission check : Pilot submission failure has been found since 11:28:00 UTC on 2018/10/03.  BIIDCO-1349 - Getting issue details... STATUS

DIRAC.LMU.de

  • Not in use in MC production BIIDCO-26 - Getting issue details... STATUS
  • Banned for now.

DIRAC.MIPT.ru

  • 100% Jobs fail BIIDCO-1506 - Getting issue details... STATUS
  • Health checker info. : "Belle II software could not be installed on " has been found since 08:20:00 UTC on 2018/12/14.
  • Health checker info. : "Belle II software could not be installed on " has been found at 22:20:00 UTC on 2018/12/05.
  • Health checker info. : "Belle II software could not be installed on " has been found since 01:20:00 UTC on 2018/11/24.
  • Health checker info. : "Short pilot jobs" has been found since 21:20:00 UTC on 2018/11/23.(details)
  • Health checker info. : "Belle II software could not be installed on " has been found since 21:20:00 UTC on 2018/11/21.
  • Health checker info. : "Belle II software could not be installed on " has been found at 14:20:00 UTC on 2018/11/20. BIIDCO-1443 - Getting issue details... STATUS
  • Job Status Plots "Application Finished with Errors " (100 %) on 9/10/18.https://agira.desy.de/browse/BIIDCO-1358
  • Health checker info. : "Aborted pilot jobs" has been found at 14:20:00 UTC on 2018/10/06.(details)
  • Health checker info. : "Aborted pilot jobs" has been found at 07:20:00 UTC on 2018/09/21.(details) BIIDCO-747 - Getting issue details... STATUS
  • Health checker info. : "Short pilot jobs" has been found at 20:20:00 UTC on 2018/09/13.(details)
  • Health checker info. : "Short pilot jobs" has been found at 15:20:00 UTC on 2018/07/23.(details)
  • Health checker info. : "Short pilot jobs" has been found at 04:20:00 UTC on 2018/07/19.
  • Health checker info. : "Aborted pilot jobs" has been found at 22:20:00 UTC on 2018/07/18.
  • Job status plots, "Application Finished With Errors" has been found at about 04:00:00 JST on 2018/07/06. (details)
  • Health checker info. : "Aborted pilot jobs" has been found at 12:20:00 UTC on 2018/02/11 BIIDCO-747 - Getting issue details... STATUS

DIRAC.Nagoya.jp

  • Job status check: Application finished with errors (5.6% of the jobs in last 24 hours) on 2018/12/21 at 8:48 UTC.
  • Health checker info. : "Short pilot jobs" has been found at 07:20:00 UTC on 2018/12/21.(details)
  • Health checker info. : "Short pilot jobs" has been found at 14:20:00 UTC on 2018/12/13.(details)
  • Health checker info. : "Short pilot jobs" has been found since 19:20:00 UTC on 2018/11/15.
  • Health checker info. : "Short pilot jobs" has been found at 14:20:00 UTC on 2018/10/21.
  • Health checker info. : "Short pilot jobs" has been found since 19:20:00 UTC on 2018/09/30.(details)
  • Health checker info. : "Short pilot jobs" has been found since 12:20:00 UTC on 2018/08/17.(details) BIIDCO-1227 - Getting issue details... STATUS

DIRAC.Nara-WU.jp

  • Job status check: Application finished with errors (11% of the jobs in last 24 hours) on 2018/12/21 at 8:48 UTC.
  • Under commisioning from 2018-11-13 BIIDCO-1432 - Getting issue details... STATUS

DIRAC.NDU.jp

  • Health checker info. : Short pilot jobs has been found at 22:20:00 UTC on 2018/12/12 BIIDCO-1525 - Getting issue details... STATUS

DIRAC.Niigata.jp

  • BIIDCO-1510 - Getting issue details... STATUS
  • Health checker info. : Short pilot jobs has been found since 17:20:00 UTC on 2018/12/12. BIIDCO-1526 - Getting issue details... STATUS
  • Job submission check : Pilot submission failure has been found since 12:36:00 UTC on 2018/10/17.  BIIDCO-1376 - Getting issue details... STATUS
  • Health checker info. : "Aborted pilot jobs" has been found at 14:20:00 UTC on 2018/10/06.(details)

DIRAC.Osaka-CU.jp

  • Job submission check : Pilot submission failure has been found since 07:23:00 UTC on 2018/12/04. (details)
  • Pilot submission failure has been found since 18:32:00 UTC on 2018/11/24  BIIDCO-1434 - Getting issue details... STATUS
  • Health checker info. : "Short pilot jobs" has been found since 22:20:00 UTC on 2018/03/17.
    → Ask site admin to check the status 2018-03-17 10:00 JST. (DB access failure again from DIRAC.Osaka-CU.jp to PNNL from 2018-03-16 11:00 UTC)
    BIIDCO-290 - Getting issue details... STATUS
  •  MCProduction = 5 BIIDCO-312 - Getting issue details... STATUS

DIRAC.PNNL.us

  • Site to be decommissioned BIIDCO-919 - Getting issue details... STATUS

DIRAC.PNNL2.us

  • Site to be decommissioned BIIDCO-920 - Getting issue details... STATUS

DIRAC.PNNL-CASCADE.us

  • Seeing no jobs (no plot) is not a problem

DIRAC.PNNL-PIC.us

  • Seeing no jobs (no plot) is not a problem

DIRAC.RCNP.jp

  • Health checker info. : "Short pilot jobs" has been found since 06:20:00 UTC on 2018/12/14. BIIDCO-1497 - Getting issue details... STATUS BIIDCO-1527 - Getting issue details... STATUS
  • Health checker info. : "Not enough disk space on " has been found since 05:20:00 UTC on 2018/10/25. BIIDCO-1394 - Getting issue details... STATUS
  • Job Status : "Job has exceeded wall clock time" : Pink Colour : (100%) on 9/10/18. BIIDCO-1358 - Getting issue details... STATUS
  • Health checker info. : "Aborted pilot jobs" has been found since 12:20:00 UTC on 2018/09/08.(details)

DIRAC.SSU.kr

  • BIIDCO-1543 - Getting issue details... STATUS
  • Health checker info. : "Short pilot jobs" has been found since 17:20:00 UTC on 2018/11/01. BIIDCO-1415 - Getting issue details... STATUS

DIRAC.TIFR.in

  • Health checker info. : "Short pilot jobs" has been found since 13:20:00 UTC on 2018/10/22.
  • Job status plots, "Application Finished With Errors" has been found at about 00:00:00 JST on 2018/07/06. (details) BIIDCO-1132 - Getting issue details... STATUS
  • Health checker info. : "Short pilot jobs" -- Already reported:  BIIDCO-971 - Getting issue details... STATUS
  •  RunningLimit is set for MCProduction=1 BIIDCO-1006 - Getting issue details... STATUS
  • Job stalled at input data resolution BIIDCO-714 - Getting issue details... STATUS

DIRAC.TMU.jp

  • Health checker info. : "Short pilot jobs" has been found since 10:20:00 UTC on 2018/12/14. BIIDCO-1529 - Getting issue details... STATUS
  • Health checker info. : "Short pilot jobs" has been found since 10:20:00 UTC on 2018/11/02 BIIDCO-1409 - Getting issue details... STATUS BIIDCO-1522 - Getting issue details... STATUS
  • Job status check: "Application finished with errors" (60%) on 2018/10/26.
  • Health checker info. : "Belle II software could not be installed on " has been found since 18:20:00 UTC on 2018/10/17.
  • Health checker info. : "Short pilot jobs" has been found since 01:20:00 UTC on 2018/10/15.  BIIDCO-1373 - Getting issue details... STATUS
  • Job submission check : Pilot submission failure has been found since 10:24:00 UTC on 2018/09/30. (details)

DIRAC.Tokyo.jp

  • Date, Issue, Tickets..

DIRAC.UAS.mx

  • 100% of jobs fails with errors BIIDCO-1508 - Getting issue details... STATUS
  • Health checker info. : "Belle II software could not be installed on " has been found since 04:20:00 UTC on 2018/12/17.  BIIDCO-1508 - Getting issue details... STATUS
  • Health checker info. : "Belle II software could not be installed on " has been found since 16:20:00 UTC on 2018/11/14.
  • Job submission check : Pilot submission failure has been found since 01:26:00 UTC on 2018/09/21. (details)
  • Health checker info. : "Short pilot jobs" has been found since 17:20:00 UTC on 2018/09/17 Added to BIIDCO-1286 - Getting issue details... STATUS
  • Job submission check : Pilot submission failure has been found since 15:24:00 UTC on 2018/09/16 (emailed comp-dc-operations, create JIRA ticket when able)

DIRAC.UVic.ca

  • GGUS ticket : "CA-VICTORIA-WESTGRID-T2 : FTS connection timeout to srm://kek2-se03.cc.kek.jp"(137332) has been submited at 04:28:09 UTC on 2018/09/22.
  • Health checker info. : "Short pilot jobs" has been found since 20:20:00 UTC on 2018/10/07.(details)
  • Health checker info. : "Short pilot jobs" has been found since 08:20:00 UTC on 2018/08/16.(details)

DIRAC.UVic-local.ca

  • Health checker info. : "Short pilot jobs" has been found since 05:20:00 UTC on 2018/09/29.(details)

DIRAC.Yamagata.jp

  • Job status check: Application finished with errors (13% of the jobs at 11:15 UTC, but 100% in the last hours) on 2018/12/21.
  • Health checker info. : "Short pilot jobs" has been found since 04:20:00 UTC on 2018/12/12.(details)
  • Job submission check : Pilot submission failure has been found since 01:27:00 UTC on 2018/09/16. (details) BIIDCO-1290 - Getting issue details... STATUS
  • Health checker info. : "Short pilot jobs" has been found since 15:20:00 UTC on 2018/05/21.(details)

DIRAC.Yonsei.kr

  • Health checker info. : "Short pilot jobs" has been found at 06:20:00 UTC on 2018/12/08. 
    BIIDCO-1416 - Getting issue details... STATUS

LCG.CESNET.cz

  • Health checker info. : "Short pilot jobs" has been found at 07:20:00 UTC on 2018/12/21.(details)
  • Job submission check : Pilot submission failure has been found at 07:21:00 UTC on 2018/11/19.
  • Job submission check : Pilot submission failure has been found since 18:22:00 UTC on 2018/11/16.

  • Health checker info. : "Short pilot jobs" has been found since 18:20:00 UTC on 2018/11/02. BIIDCO-1409 - Getting issue details... STATUS BIIDCO-1523 - Getting issue details... STATUS

  • Job submission check : Pilot submission failure has been found since 19:40:00 UTC on 2018/05/23. (details)
  •   Need some intervention to run Merge jobs BIIDCO-771 - Getting issue details... STATUS
  • Job submission check : Pilot submission failure has been found since 17:34:00 UTC on 2018/05/16. (details)

LCG.CNAF.it

  • Health checker info. : "Aborted pilot jobs" has been found since 04:20:00 UTC on 2018/12/13 BIIDCO-1488 - Getting issue details... STATUS
  • Short pilot jobs" has been found since 21:20:00 UTC on 2018/11/21 BIIDCO-1455 - Getting issue details... STATUS

LCG.Cosenza.it

  • Health checker info. : "Short pilot jobs" has been found since 12:20:00 UTC on 2018/10/30.
  • Downtime 2018-10-25 13:00 (UTC) - 2018-10-26 13:00 (UTC)    and  2018-10-23 13:00 (UTC) - 2018-10-25 13:00 (UTC)
    BIIDCO-1393 - Getting issue details... STATUS
  • Health checker info. : "Short pilot jobs" has been found since 16:20:00 UTC on 2018/10/08.(details)  

LCG.CYFRONET.pl

  • Job status check: Stalled (9% of the jobs in last 24 hours) on 2018/12/21 at 8:48 UTC.
  • Health checker info. : "Short pilot jobs" has been found since 13:20:00 UTC on 2018/12/13.(details)
    BIIDCO-1246 - Getting issue details... STATUS
  • Health checker info. : "Short pilot jobs" has been found at 14:20:00 UTC on 2018/10/21.
  • Health checker info. : "Aborted pilot jobs" has been found at 18:20:00 UTC on 2018/09/13.(details)
  • Downtime (Decommissioning cream.grid.cyf-kr.edu.pl and cream02.grid.cyf-kr.edu.pl): Start time: 2018-02-27 23:00 (UTC), End time: 2018-12-31 00:00 (UTC)  BIIDCO-694 - Getting issue details... STATUS

LCG.DESY.de

  • The site to be retired  BIIDCO-1240 - Getting issue details... STATUS  – No more jobs to be submitted.
  • Downtime, Start time: 2018-09-01 00:00 (UTC), End time: 2018-09-30 23:59 (UTC)  BIIDCO-1276 - Getting issue details... STATUS

LCG.Frascati.it

  • Health checker info. : "Failed pilot jobs" has been found at 14:20:00 UTC on 2018/10/21.
  • Health checker info. : "Short pilot jobs" has been found since 12:20:00 UTC on 2018/07/10.(details BIIDCO-1153 - Getting issue details... STATUS
  • Health checker info. : "Short pilot jobs" has been found since 06:20:00 UTC on 2018/06/30.(details)

LCG.HEPHY.at

  • Health checker info. : Short pilot jobs has been found since 21:20:00 UTC on 2018/12/12 BIIDCO-1532 - Getting issue details... STATUS
  • Health checker info. : "BLAH ERROR" has been found since 13:20:00 UTC on 2018/10/09.(details)
  • Job submission check : Pilot submission failure has been found since 16:25:00 UTC on 2018/06/21. (details) BIIDCO-1107 - Getting issue details... STATUS

  •  MCProduction = 680 BIIDCO-281 - Getting issue details... STATUS

LCG.IPHC.fr.

  • Health checker info. : "Failed pilot jobs" has been found at 00:20:00 UTC on 2018/06/18.(details)

LCG.KEK.jp

  • Job submission check : Pilot submission failure has been found since 05:25:00 UTC on 2018/12/20. BIIDCO-1548 - Getting issue details... STATUS
  • Health checker info. : "Short pilot jobs" has been found since 17:20:00 UTC on 2018/11/12.(details BIIDCO-1431 - Getting issue details... STATUS

  • Health checker info. : "Failed pilot jobs" has been found since 06:20:00 UTC on 2018/10/10.(details)
  • Health checker info. : "Belle II software could not be installed on cb512.cc.kek.jp" has been found since 21:20:00 UTC on 2018/10/01.
  • Health checker info. : "Failed pilot jobs" has been found at 22:20:00 UTC on 2018/09/24.(details)
  • Performance degraded with "Input data resolution" status since 2018-07-24 around 20:00 UTC BIIDCO-1191 - Getting issue details... STATUS

LCG.KEK2.jp

  • Health checker info. : "Failed pilot jobs" has been found since 13:20:00 UTC on 2018/12/21. BIIDCO-1559 - Getting issue details... STATUS

  • Job submission check : Pilot submission failure has been found at 06:25:00 UTC on 2018/12/20. BIIDCO-1548 - Getting issue details... STATUS
  • all jobs are in "Input data resolution" status since 12.00 2018/12/18 UTC BIIDCO-1542 - Getting issue details... STATUS
  • Health checker info. : "Short pilot jobs" has been found since 21:20:00 UTC on 2018/11/23.(details)
  • Health checker info. : "Short pilot jobs" has been found since 19:20:00 UTC on 2018/11/22.(details) BIIDCO-1450 - Getting issue details... STATUS
  • Health checker info. : "Failed pilot jobs" has been found since 06:20:00 UTC on 2018/10/10.(details)
  • Health checker info. : "Failed pilot jobs" has been found at 06:20:00 UTC on 2018/09/28.(details)

LCG.KISTI.kr

  • Jobs slots are disabled for SE maintenace from 2018-10-19 to 2018-10-23 BIIDCO-1380 - Getting issue details... STATUS
  • Health checker info. : "BLAH ERROR" has been found since 06:20:00 UTC on 2018/10/19.(details)

  • "Short pilot jobs" has been found at 06:20:00 UTC on 2018/10/09.(details)
  • BLAH error seems to be happen if jobs exceed the allocated # of queues, not a problem (Site specific feature)  
    BIIDCO-1259 - Getting issue details... STATUS
  • MCProduction= 280 BIIDCO-280 - Getting issue details... STATUS
  • A large number of Merge jobs in waiting status BIIDCO-773 - Getting issue details... STATUS

LCG.KMI.jp

  • Job status check: Application finished with errors (7% of the jobs in last 24 hours) on 2018/12/21 at 8:48 UTC.
  • Health checker info. : "Short pilot jobs" has been found at 14:20:00 UTC on 2018/12/13.(details) BIIDCO-1533 - Getting issue details... STATUS
  • Health checker info. : "Belle II software could not be installed on pwn22.local" has been found since 21:20:00 UTC on 2018/11/22.
  • Health checker info. : "Belle II software could not be installed on pwn22.local" has been found since 05:20:00 UTC on 2018/11/22.
  • Job submission check : Pilot submission failure has been found since 21:24:00 UTC on 2018/10/02. (details)

LCG.LAL.fr

Site under commissioning. Issues to be reported.

LCG.Legnaro.it

  • Downtime: 2018-10-16 06:30 2018-10-16 17:00 SE Software update  BIIDCO-1372 - Getting issue details... STATUS
  • Health checker info. : "Failed pilot jobs" has been found since 13:20:00 UTC on 2018/03/16.(details)
  • Downtime: Start downtime: 2018-10-29 07:00 -- End downtime: 2018-10-29 11:00 BIIDCO-1400 - Getting issue details... STATUS

LCG.Napoli.it

  • Job submission check : Pilot submission failure has been found since 07:21:00 UTC on 2018/11/10.  BIIDCO-1267 - Getting issue details... STATUS
  • Health checker info. : "Short pilot jobs" has been found at 22:20:00 UTC on 2018/11/04 
  • Health checker info. : "Short pilot jobs" has been found at 06:20:00 UTC on 2018/10/09.(details) BIIDCO-1398 - Getting issue details... STATUS

  • Health checker info. : "Failed pilot jobs" has been found at 14:20:00 UTC on 2018/10/06.(details)
  • This site is in down time schedule from 2018-10-02 16:00 (UTC)  to 2018-10-08 18:00 (UTC)  BIIDCO-1348 - Getting issue details... STATUS
  • Health checker info. : "Failed pilot jobs" has been found since 16:20:00 UTC on 2018/09/27.(details)
  • Health checker info. : "Failed pilot jobs" has been found since 08:20:00 UTC on 2018/09/19.(details) BIIDCO-825 - Getting issue details... STATUS
  • Job submission check : Pilot submission failure has been found at 05:32:00 UTC on 2018/09/11. (details BIIDCO-1267 - Getting issue details... STATUS
  • Job submission check : Pilot submission failure has been found since 01:21:00 UTC on 2018/11/07.
  • Stalled jobs BIIDCO-1255 - Getting issue details... STATUS
  • "Failed pilot jobs" has been found at 14:20:00 UTC on 2018/03/17. BIIDCO-825 - Getting issue details... STATUS

LCG.NTU.tw

  • Health checker info. : "Failed pilot jobs" has been found since 16:20:00 UTC on 2018/11/20.(details) BIIDCO-1453 - Getting issue details... STATUS
  • Job submission check : Pilot submission failure has been found since 12:25:00 UTC on 2018/11/20. (details) BIIDCO-1443 - Getting issue details... STATUS
  • Health checker info. : "CRL has expired" has been found since 08:20:00 UTC on 2018/11/04.  BIIDCO-1430 - Getting issue details... STATUS
    Solved and verified : GGUS ticekt :https://ggus.eu/index.php?mode=ticket_info&ticket_id=138235
  • Job submission check : Pilot submission failure has been found since 15:23:00 UTC on 2018/11/03.  BIIDCO-1377 - Getting issue details... STATUS
    Solved and verified : GGUS ticket : https://ggus.eu/?mode=ticket_info&ticket_id=138290 has submitted 2018-11-14 03:28 UTC
  • Job submission check : Pilot submission failure has been found since 08:28:00 UTC on 2018/10/15.  BIIDCO-1377 - Getting issue details... STATUS
  • Solved and verified 2018-11-01: GGUS ticket : https://ggus.eu/index.php?mode=ticket_info&ticket_id=137827 has submitted 2018-10-18 14:39 UTC
  • Job submission check : Pilot submission failure has been found since 02:26:00 UTC on 2018/10/07. (details)
  • GGUS ticket : "TW-NTU-HEP : FTS transfer timeout to from/to belle2grid3.cc.ntu.edu.tw"(137334) has been submited at 05:10:14 UTC on 2018/09/22.
  • Health checker info. : "Failed to install DIRAC on node29" has been found since 16:20:00 UTC on 2018/09/24. BIIDCO-1324 - Getting issue details... STATUS
  • Health checker info. : "BLAH ERROR" has been found at 18:20:00 UTC on 2018/09/15.(details)
  • Health checker info. : "CRL has expired" has been found since 11:20:00 UTC on 2018/08/16. Created JIRA ticket  BIIDCO-1217 - Getting issue details... STATUS

LCG.Pisa.it

  • Health checker info. : "Short pilot jobs" has been found since 02:20:00 UTC on 2018/12/22.(details)
  • Job status check: Application finished with errors (5% of the jobs in last 24 hours) on 2018/12/21 at 8:48 UTC.
  • Health checker info. : "Short pilot jobs" has been found since 06:20:00 UTC on 2018/12/21.(details)
  • 100% Jobs fail at LCG.Pisa.it  BIIDCO-1509 - Getting issue details... STATUS
  • Health checker info. : "Short pilot jobs" has been found since 16:20:00 UTC on 2018/12/04
  • "Failed pilot jobs" has been found at 07:20:00 UTC on 2018/11/19.
  • "Short pilot jobs" has been found at 07:20:00 UTC on 2018/11/19.
  • Health checker info. : "Short pilot jobs" has been found since 19:20:00 UTC on 2018/11/15. BIIDCO-1157 - Getting issue details... STATUS
  • Health checker info. : "Short pilot jobs" has been found since 16:20:00 UTC on 2018/11/14.  BIIDCO-1157 - Getting issue details... STATUS

  • Health checker info. : "Short pilot jobs" has been found at 14:20:00 UTC on 2018/11/11.
  • Job submission check : Pilot submission failure has been found since 10:24:00 UTC on 2018/11/11.  BIIDCO-1315 - Getting issue details... STATUS
  • GGUS ticket :
    1. "INFN-PISA: possible issue in CA certificate directory"(136751) has been submited at 05:26:08 UTC on 2018/08/17.
    2. "INFN-PISA: Disk space on WNs"(136750) has been submited at 04:17:17 UTC on 2018/08/17.
    3. "INFN-PISA: CVMFS availability on WNs"(136749) has been submited at 03:20:57 UTC on 2018/08/17.
    4. "INFN-PISA : Pilot failed at gridce3.pi.infn.it"(130815) has been submited at 10:11:45 UTC on 2017/09/28.
  • Health checker info. : "Short pilot jobs" has been found since 05:20:00 UTC on 2018/11/06.
  • Health checker info. : "Short pilot jobs" has been found since 03:20:00 UTC on 2018/10/31. BIIDCO-1157 - Getting issue details... STATUS
  • Health checker info. : "Short pilot jobs" has been found since 22:20:00 UTC on 2018/10/29.(details)
  • Health checker info. : "Short pilot jobs" has been found at 07:20:00 UTC on 2018/10/27.(details)
  • Health checker info. : "Short pilot jobs" has been found since 21:20:00 UTC on 2018/10/25.(details)
  • Health checker info. : "Short pilot jobs" has been found since 03:20:00 UTC on 2018/10/24.  BIIDCO-1157 - Getting issue details... STATUS
  • Health checker info. : "Short pilot jobs" has been found at 15:20:00 UTC on 2018/10/23.
  • Health checker info. : "Short pilot jobs" has been found since 19:20:00 UTC on 2018/10/22.
  • Health checker info. : "Short pilot jobs" has been found since 12:20:00 UTC on 2018/10/21.
  • Health checker info. : "Short pilot jobs" has been found since 18:20:00 UTC on 2018/10/20.(details)
  • Health checker info. : "Failed pilot jobs" has been found since 06:20:00 UTC on 2018/10/09.(details)
  • Health checker info. : "Short pilot jobs" has been found at 06:20:00 UTC on 2018/10/02.(details)
  • "Failed pilot jobs" has been found since 21:20:00 UTC on 2018/09/22.(details)
  • "Short pilot jobs" has been found since 02:20:00 UTC on 2018/09/21.(details) BIIDCO-1157 - Getting issue details... STATUS
  •  possible issue in CA certificate directory on WN se1wn26.pi.infn.it  BIIDCO-1220 - Getting issue details... STATUS
  • "Not enough disk space on <various servers> has been found since <various times> UTC on 2018/08/12 onwards. BIIDCO-1211 - Getting issue details... STATUS
  • Failed to install DIRAC on ... BIIDCO-1152 - Getting issue details... STATUS
  • "BLAH ERROR" has been found since 03:20:00 UTC on 2018/06/20. "Aborted pilot jobs" has been found since 03:20:00 UTC on 2018/06/20.(details), BIIDCO-1106 - Getting issue details... STATUS , the related GGUS ticket 130815--INFN-PISA : Pilot failed at gridce3.pi.infn.it

LCG.Roma3.it

  • Job status check: Application finished with errors (25% of the jobs in last 24 hours) and Stalled (36%) on 2018/12/22 at 8:00 UTC.
  • Health checker info. : "Failed pilot jobs" has been found since 13:20:00 UTC on 2018/12/21.
  • Job status check: Application finished with errors (10% of the jobs in last 24 hours) and Stalled (76%) on 2018/12/21 at 8:48 UTC.
  • Health checker info. : "Failed pilot jobs" has been found at 07:20:00 UTC on 2018/12/21.(details)
  • Stalled jobs on 2018/12/20.
  • Health checker info. : "Failed pilot jobs" has been found at 06:20:00 UTC on 2018/12/08. BIIDCO-1538 - Getting issue details... STATUS

  • Health checker info. : "Aborted pilot jobs" has been found at 16:20:00 UTC on 2018/10/09.(details)
  • "BLAH ERROR" has been found at 04:20:00 UTC on 2018/06/20.  BIIDCO-1106 - Getting issue details... STATUS
  • Roma3 commissioning BIIDCO-111 - Getting issue details... STATUS   (NOTE: This ticket seems obsolete, it should be closed and removed from operation status)

LCG.TAU.il

  • Health checker info. : "Short pilot jobs" has been found at 22:20:00 UTC on 2018/11/01.
  • Downtime - Start time: 2018-09-26 01:00(UTC), End time: 2018-09-28 20:00(UTC) BIIDCO-1328 - Getting issue details... STATUS
  • Health checker info. : "Failed pilot jobs" has been found at 18:20:00 UTC on 2018/09/14.(details)

LCG.Torino.it

  •  Health checker info. : "BLAH ERROR" has been found since 21:20:00 UTC on 2018/12/06.
  • Health checker info. : "BLAH ERROR" has been found since 03:20:00 UTC on 2018/11/23. BIIDCO-1451 - Getting issue details... STATUS
  • Health checker info. : "BLAH ERROR" has been found at 23:20:00 UTC on 2018/11/22.(details)
  • Health checker info. : "Failed pilot jobs" has been found since 13:20:00 UTC on 2018/11/11.
  • Health checker info. : "Failed pilot jobs" has been found at 22:20:00 UTC on 2018/11/07.
  • Job submission check : Pilot submission failure has been found at 22:22:00 UTC on 2018/11/04.
  • Health checker info. : "Failed pilot jobs" has been found since 13:20:00 UTC on 2018/11/04
  • Health checker info. : "Failed pilot jobs" has been found since 20:20:00 UTC on 2018/11/01.  BIIDCO-1417 - Getting issue details... STATUS
  • Health checker info. : "Failed pilot jobs" has been found since 17:20:00 UTC on 2018/10/31. BIIDCO-252 - Getting issue details... STATUS
  • Health checker info. : "BLAH ERROR" has been found since 21:20:00 UTC on 2018/10/22.

LCG.ULAKBIM.tr

OSG.BNL.us

  • Health checker info. : "Short pilot jobs" has been found since 15:20:00 UTC on 2018/12/21.(details)
  • Health checker info. :
    1. "Short pilot jobs" has been found since 02:20:00 UTC on 2018/12/21.(details)
    2. "Aborted pilot jobs" has been found at 07:20:00 UTC on 2018/12/21.(details)
  • Health checker info. : "Short pilot jobs" has been found since 18:20:00 UTC on 2018/12/18.
  • Health checker info. : "Short pilot jobs" has been found since 07:20:00 UTC on 2018/12/14.
  • Health checker info. : "Short pilot jobs" has been found since 02:20:00 UTC on 2018/12/11.
  •   BNL network interruption 2018-Dec-18 14:00-15:00 UTC BIIDCO-1462 - Getting issue details... STATUS
  • Health checker info. : "Short pilot jobs" has been found since 13:20:00 UTC on 2018/07/13.(details) BIIDCO-1164 - Getting issue details... STATUS
    • Recurring issue 2018/11/23 through 09/30
  • Health checker info. : "Aborted pilot jobs" has been found since 09:20:00 UTC on 2018/09/19.(details) BIIDCO-950 - Getting issue details... STATUS
    • Recurring issue 2018/11/04 through 09/23
  • Production jobs: UNAVAILABLE files BIIDCO-1302 - Getting issue details... STATUS
  • User jobs: SRM_AUTHORIZATION_FAILURE BIIDCO-1303 - Getting issue details... STATUS
  • Number of concurrent MCProduction jobs restricted BIIDCO-1256 - Getting issue details... STATUS
  •  MCProduction jobs are mostly stalled BIIDCO-1253 - Getting issue details... STATUS

OSG.CORI.us

  • OSG.CORI.us resource has been removed because CY18 allocation was not approved

OSG.UMiss.us

  • Health checker info. : "Aborted pilot jobs" has been found since 01:20:00 UTC on 2018/12/22.(details) BIIDCO-1550 - Getting issue details... STATUS
    GGUS ticket https://ggus.eu/index.php?mode=ticket_info&ticket_id=138979 has submitted at 2018-12-22 14:45 UTC.
  • Health checker info. : "Short pilot jobs" has been found since 22:20:00 UTC on 2018/12/14.
  • Health checker info. : "Short pilot jobs" has been found since 18:20:00 UTC on 2018/11/02. BIIDCO-1142 - Getting issue details... STATUS

SSH.KMI.jp

  • Job status check: Application finished with errors (12% of the jobs in last 24 hours) on 2018/12/22 at 11:30 UTC.
  • Health checker info. : "Short pilot jobs" has been found at 20:20:00 UTC on 2018/08/13.

VCYCLE.Napoli.it, VCYCLE.HNSC01.it, VCYCLE.HNSC02.it

  • Opportunistic site (Empty plot is not a problem)


Links


  • No labels