You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Version 1

CESNET-TMP-SE



Contents

 Click here to expand...

l



Production Plans

Data Production Status

  • Raw data processing
    • Proc11 (2019a/b/c) completed
    • Bucket 9-12 (prompt processing of (2020a/b) completed
    • Prompt processing of remaining exp12 is also planned. (Bucket13, Bucket14)
  • MC13 production

    • MC13a (Run-independent MC) production: Keep producing both generic samples and signal samples

    • MC13b (Run-dependent MC) production: ongoing
  • Skim
    • SkimP10x1 (Proc10 skims): ongoing
    • SkimM13ax1 (MC13 skims): ongoing

Production Status

Full resource usage

Data production summary page : Data Production Status

Data (re)processing:

  • Currently no data-processing jobs

MC production:

  • An additional production for 1/ab of nominal phase 3 generic MC13a is ongoing.

Analysis skimming:

  • Skims of the ICHEP dataset are currently in progress and progressing.


Central Services

Dirac (dirac.cc.kek.jp, b2dchsv01-b2dchsv06.cc.kek.jp, b2dchsv08.cc.kek.jp)

  • Data, Issue, Tickets...

DB Production (b2dchdb1.cc.kek.jp, b2dchdb2.cc.kek.jp, b2dcsdb1.cc.kek.jp, b2dcsdb2.cc.kek.jp)

  • Data, Issue, Tickets...

"Web" servers

  • Data, Issue, Tickets...


DDM (bldirac01.sdcc.bnl.gov)

  • 2018-03-01 DDM deletion task seems stuck BIIDCO-808 - Getting issue details... STATUS

Conditions DB ()

Monitor

LFC

  • Date, Issue, Tickets...

File Transfers and Replication Status

  • See also Computing OperationStatus#DDM for related issues.
  • 2020-06-15 UTC File Transfer failures: Pisa-DATA-SE BIIDCO-2488 - Getting issue details... STATUS
  • 2020-04-03 00:15 UTC File Transfer failures: Pisa-DATA-SE  BIIDCO-2349 - Getting issue details... STATUS  

FTS

Any problem in the FTS service or FTS monitoring are to be recorded here. Site/SE specific issues are to be recorded under each SIte/SE

  • 2020/08/ 8 Sat 03:02 UTC File transfer failure from KIT-TMP-SE to BNL-DATA-SE, KEK-DISK-DATA-SE, Napoli-TMP-SE and SIGNET-DATA-SE BIIDCO-2603 - Getting issue details... STATUS
  •   FTS in DIRAC Configuration set to utilize the three FTS servers (2 at KEK and 1 at BNL)   BIIDCO-2432 - Getting issue details... STATUS
  • 2020-01-02 13:20 UTC File transfer failure from KMI-TMP-SE and from KEK-Disk-TMP-SE to LAL-DATA-SE BIIDCO-2219 - Getting issue details... STATUS
  • 2020-01-02 9:15 UTC File transfer failure from KMI-TMP-SE to LAL-DATA-SE BIIDCO-2219 - Getting issue details... STATUS

Replication Status

  • Date, Issue, Tickets...

Job Status Plot

  • No job status plots for 13 sites while MC13a production is ongoing 2020-05-18 BIIDCO-2277 - Getting issue details... STATUS
  • No job status plots for 15 sites while MC13a production is ongoing 2020-02-07 BIIDCO-2277 - Getting issue details... STATUS

  • No job status figure with a range of 1 day while the MC13 production is ongoing.

Job Summary

  • Date, Issue, Tickets...



SEs

SE Common Issues

  • Issues with individual SEs should be recorded below (Primary SEs or Other SEs)

Raw data SEsLink to JIRA ticket: 

Raw data SE: KEK-RAW-SE (srm://kek2-se02.cc.kek.jp:8444/srm/managerv2?SFN=/belle/RAW)

  • 2019-07-24 15:30 UTC: all transfers failed (0/72) between KEK-RAW-SE and BNL-TMP-SE
  • Many replication failed from KEK-RAW-SE to BNL-TAPE-SE at 11:30, 25, May 2020  BIIDCO-2445 - Getting issue details... STATUS

Raw data SE: BNL-TAPE-SE (srm://dcblsrm.sdcc.bnl.gov:8443/srm/managerv2?SFN=/pnfs/sdcc.bnl.gov/tape

Primary SEs

Primary SE: BNL-TMP-SE (dcblsrm.sdcc.bnl.gov)

  • SE Health Check by DDM: Failure on download have been observed since 2020-06-15 20:33:33 (3 hours)
  • SE Health Check by DDM: Failure on download have been observed since 2020-02-07 09:19:40 (5 hours)
  • No Replication Trend Plot for BNL-TMP-SE 2020-01-02 09:30 UTC  BIIDCO-2220 - Getting issue details... STATUS
  • SE Health check by DDM : download does not work since 2019-05-16 07:11:21 UTC.

  •  UNAVAILABLE files BIIDCO-1302 - Getting issue details... STATUS
  • SE Health check by DDM : download does not work since 2019-05-15 21:03:25 UTC

Primary SE: CESNET-TMP-SE (dpm1.egee.cesnet.cz) 

  • Date, Issue, Tickets...

Primary SE: CNAF-TMP-SE (storm-fe-archive.cr.cnaf.infn.it)CNAF-TMP-SE

  • SE Health Check by DDM: Failure on ls, upload have been observed since 2020-07-21 08:18:39 (69 hours)
    All file transfers are failed. BIIDCO-2577 - Getting issue details... STATUS  
  • BIIDCO-2576 - Getting issue details... STATUS
  • SE Health Check by DDM: Failure on download have been observed at since 2019-05-13 08:27:21 UTC and since 2020-02-19 15:44:39 (7 hours)
  • File transfer failure for source have been observed since 2019-12-23 02:00 UTC
  • SE Health check by DDM : remove file, remove directory, download, upload, ls do not work since 2019-04-25 23:13:00 UTC.
  • 2019/01/27 File transfer failures from CNAF-TMP-SE to NTUCC-DATA-SE. BIIDCO-1637 - Getting issue details... STATUS

Primary SE: DESY-TMP-SE (dcache-se-desy.desy.de)

Primary SE: KEK-DISK-TMP-SE (srm://kek2-se03.cc.kek.jp:8444/srm/managerv2?SFN=/disk/belle/TMP)

  • 2020-01-02 13:20 UTC File transfer failure from KMI-TMP-SE and from KEK-Disk-TMP-SE to LAL-DATA-SE BIIDCO-2219 - Getting issue details... STATUS

Primary SE: KEK2-TMP-SE (srm://kek2-se03.cc.kek.jp:8444/srm/managerv2?SFN=/belle/TMP)

  • Date, Issue, Tickets...

Primary SE: KISTI-TMP-SE (belle-se-head.sdfarm.kr)

  • No new assignment of MC production data blocks to this destination BIIDCO-848 - Getting issue details... STATUS

  • SE Health check by DDM : download, upload do not work since 2019-09-22 23:38:00 UTC.

Primary SE: KIT-TMP-SE (dcachesrm-kit.gridka.de)

  • Failure on upload have been observed since 2020-08-03 07:11 UTC BIIDCO-2595 - Getting issue details... STATUS

Primary SE: KMI-TMP-SE (nsrmfe01.hepl.phys.nagoya-u.ac.jp )

  • 2020-01-02 09:15 UTC File transfer failure from KMI-TMP-SE to LAL-DATA-SE BIIDCO-2219 - Getting issue details... STATUS

Primary SE: Napoli-TMP-SE (belle-dpm-01.na.infn.it )

  • SE Health Check by DDM: Failure on ls have been observed since 2020-06-22 23:18:22 (3 hours)

  • BIIDCO-2460 - Getting issue details... STATUS

Primary SE: SIGNET-TMP-SE (dcache.ijs.si )

  • SE Health Check by DDM: Failure on upload have been observed since 2020-07-22 15:43:16 (15 hours) BIIDCO-2575 - Getting issue details... STATUS
  •   BIIDCO-2524 - Getting issue details... STATUS
  • unbanned 2020-06-13
  • Operation->Defaults->ResourceStatus->Policies→Ban_SE→matchParams 2020-06-12 GGUS-147427

Other SEs

Adelaide-TMP-SE (coepp-dpm-01.ersa.edu.au)

  •  Adelaide SE is banned BIIDCO-2184 - Getting issue details... STATUS

CYFRONET-TMP-SE (dpm.cyf-kr.edu.pl)

  • Date, Issue, Tickets...
  • CYFRONET is banned for write  BIIDCO-2392 - Getting issue details... STATUS
  • CYFRONET SE to be replaced BIIDCO-2391 - Getting issue details... STATUS

CINVESTAV-TMP-SE (jaguar-se.fis.cinvestav.mx)

Frascati-TMP-SE (atlasse.lnf.infn.it)

  • Date, Issue, Tickets...

HEPHY-TMP-SE (hephyse.oeaw.ac.at)

  • Date, Issue, Tickets...

  •   HEPHY-TMP-SE banned BIIDCO-2393 - Getting issue details... STATUS

IPHC-TMP-SE (sbgse1.in2p3.fr)

  • Date, Issue, Tickets...

LAL-TMP-SE (grid05.lal.in2p3.fr)

  • Date, Issue, Tickets...

Melbourne-TMP-SE (b2se.mel.coepp.org.au)

McGill-TMP-SE  (storm02.clumeq.mcgill.ca)

  • BIIDCO-516 - Getting issue details... STATUS McGill-TMP-SE will be decomissioned in early 2018.

MPPMU-TMP-SE (grid-srm.rzg.mpg.de)


NTU-TMP-SE (bgrid3.phys.ntu.edu.tw)

  •  NTU-TMP-SE banned for write  BIIDCO-1993 - Getting issue details... STATUS

NTU-CC-TMP-SE (belle2grid3.cc.ntu.edu.tw)

  • 2019/8/23 file transfer failure to NTU-CC-DATA-SE BIIDCO-1977 - Getting issue details... STATUS
  • 2019/01/27 File transfer failures from CNAF-TMP-SE to NTUCC-DATA-SE. BIIDCO-1637 - Getting issue details... STATUS   BIIDCO-1892 - Getting issue details... STATUS
  • File transfer failure and cancellation to NTUCC-DATA-SE happened 2018-12-22 BIIDCO-1551 - Getting issue details... STATUS
  • NTUCC-TMP-SE banned for write  BIIDCO-1333 - Getting issue details... STATUS

Pisa-TMP-SE (stormfe1.pi.infn.it)

  • 2020-03-28 17:40 UTC - Failed Transfer in some connections involving PISA-TMP-SE as source

PNNL-TMP-SE (se.hep.pnnl.gov) 

  • Being decommissioned. No need to report any issues.  BIIDCO-838 - Getting issue details... STATUS

Roma3-TMP-SE (storm-01.roma3.infn.it)

  •  Date, Issue, Tickets...

TAU-TMP-SE (tau-se.hep.tau.ac.il)

Torino-TMP-SE (se-srm-00.to.infn.it)

  • Date, Issue, Tickets...

ULAKBIM-TMP-SE (torik1.ulakbim.gov.tr)

  • File transfer failures destination  BIIDCO-2253 - Getting issue details... STATUS

UMiss-TMP-SE (umiss005.hep.olemiss.edu)

  • Date, Issue, Tickets...

UVic-TMP-SE(charon01.westgrid.ca)



Sites

Sites Common Issue

  • Date, issue for sites wide

ARC.DESY.de

  • MCProduction restricted BIIDCO-2503 - Getting issue details... STATUS

ARC.DESY-test.de

  • A test queue for the new CE. BIIDCO-1469 - Getting issue details... STATUS

ARC.KIT.de

  • "Aborted Pilot" has been observed since 2020-07-09 20:07 UTC (for 2 hours)
  • Downtime 2020-07-03 14:00 - 2020-07-07 14:00  BIIDCO-2490 - Getting issue details... STATUS
  • Downtime 2020-06-15 00:00 - 2020-06-26 00:00 BIIDCO-2487 - Getting issue details... STATUS
  •   MCProduction restricted BIIDCO-2504 - Getting issue details... STATUS
  • Downtime 2020-06-16 07:00 - 2020-06-16 10:00 BIIDCO-2490 - Getting issue details... STATUS

  • "Short Pilot" has been observed since 2020-06-02 22:47 UTC . BIIDCO-2459 - Getting issue details... STATUS
  • Downtime 2020-06-15 00:00 - 2020-06-21 00:00 BIIDCO-2487 - Getting issue details... STATUS

ARC.KIT-TARDIS.de

  • renamed from Test.KIT.de BIIDCO-2323 - Getting issue details... STATUS

ARC.LMU.de

  • This is a test site. Do not need to report any issue.

ARC.LMU2.de

  • Banned as currently no resource behind the CE BIIDCO-239 - Getting issue details... STATUS

ARC.Melbourne.au

  • "Failed Payload Job" has been observed since 2020-05-06 17:43 UTC (for 5 hours)  BIIDCO-2399 - Getting issue details... STATUS
  • "Failed Payload Job" has been observed since 2020-01-05 05:34 UTC (for 2 hours)

ARC.MPPMU.de

  • Downtime 2020-07-28 00:00 - 2020-07-29 00:00 BIIDCO-2585 - Getting issue details... STATUS

  • "Failed Payload Job" has been observed since 2020-06-21 04:47 UTC (for 1 hours)
  • Downtime 2020-06-16 11:00 - 2020-06-16 17:00 BIIDCO-2491 - Getting issue details... STATUS

  • BIIDCO-128 - Getting issue details... STATUS

ARC.SIGNET.si

  • "Short Pilot" has been observed since 2020-08-10 07:35 UTC (for 15 hours) BIIDCO-1368 - Getting issue details... STATUS
  • "Pilot Submission Failure" has been observed since 2020-07-06 02:07 UTC (for 5 hours)

  • "Pilot Submission Failure" has been observed since 2020-06-22 00:07 UTC (for 246 hours) BIIDCO-2525 - Getting issue details... STATUS
  • "Failed Payload Job" has been observed since 2020-06-15 07:47 UTC (for 8 hours)
  • "Failed Payload Job" has been observed since 2020-06-02 09:47 UTC (for 5 hours)
  • 23% jobs finished with error, 2020-05-09 07:00 UTC
  • "Short Pilot" has been observed since 2020-01-04 02:34 UTC (for 4 hours)
  • "Failed Payload Job" has been observed since 2020-01-04 02:34 UTC (for 4 hours)
  • Job status check: Application finished with errors (5% of the jobs) at 11:15 UTC on 2018/12/21.

  • "Failed to install DIRAC on " has been found since 20:20:00 UTC on 2018/11/03. BIIDCO-1420 - Getting issue details... STATUS

CLOUD.CC1_Krakow.pl

  • Not used in production yet. Seeing no jobs (no plot) is not a problem

CLOUD.DESY.de

  • Newly commissioned site. Problems should be reported. (With a separate ticket from BIIDCO-2270)
  •   Being configured (BIIDCO-2270). No report necessary. BIIDCO-2270 - Getting issue details... STATUS

DIRAC.Beihang.cn

  • Site is banned.
  • "Failed Payload Job" has been observed since 2019-04-19 11:15 UTC   BIIDCO-1812 - Getting issue details... STATUS
  • Health checker info. : "Short pilot jobs" has been found at 06:20:00 UTC on 2019/04/18. BIIDCO-1807 - Getting issue details... STATUS
  • Health checker info. : "Short pilot jobs" has been found since 06:20:00 UTC on 2019/04/17.
  • "Application finished with errors" (100% currently) on 2019/04/10 00:15 UTC. Problem reported since (at least) 2019/04/07 07:00 UTC.
  • Health checker info. : "Short pilot jobs" has been found at 06:20:00 UTC on 2018/12/08. BIIDCO-1534 - Getting issue details... STATUS
  • Job status check: "application finished with errors" (100% currently) on 2018/10/26.
  • Job submission check : Pilot submission failure has been found since 09:24:00 UTC on 2018/09/21. (details) BIIDCO-1312 - Getting issue details... STATUS
  • The number of jobs limited. BIIDCO-289 - Getting issue details... STATUS
  • All the upload trials are failing against all the SEs configured: OutputSE (KMI-TMP-SE, PNNL-TMP-SE), Fail-over SEs(DESY-TMP-SE, Napoli-TMP-SE, PNNL-TMP-SE, KIT-TMP-SE)  BIIDCO-43 - Getting issue details... STATUS
  • Large % of failed jobs in DIRAC status plot (Added 2016-11-03 22:45:00 UTC)  BIIDCO-38 - Getting issue details... STATUS

DIRAC.BINP.ru

  • Date, Issue, Tickets...

DIRAC.BINP-VM.ru

  • Job status check: "DIRAC.BINP-VM.ru received kill signal" since 2020-05-25 14:00 UTC

  • Job status plots, "Application Finished with errors" 2020-04-21 10:00 to 2020-04-22 10:00 UTC BIIDCO-749 - Getting issue details... STATUS

DIRAC.CINVESTAV.mx

  • "Failed Payload Job" has been observed since 2020-05-18 12:43 UTC (for 1 hours) 

DIRAC.DESY.de

  • Test site. Not in use in MC production

DIRAC.IITG.in

  • "Failed Payload Job" has been observed since 2020-07-25 19:15 UTC (for 4 hours) BIIDCO-2513 - Getting issue details... STATUS
  • "Failed Payload Job" has been observed since 2020-07-24 08:15 UTC (for 14 hours) BIIDCO-2513 - Getting issue details... STATUS
  • "Failed Payload Job" has been observed since 2020-07-22 08:15 UTC (for 38 hours) BIIDCO-2513 - Getting issue details... STATUS
  • "Failed Payload Job" has been observed since 2020-07-18 23:15 UTC (for 32 hours) BIIDCO-2513 - Getting issue details... STATUS
  • "Failed Payload Job" has been observed since 2020-06-28 04:07 UTC BIIDCO-2513 - Getting issue details... STATUS
  • "Failed Payload Job" has been observed since 2020-06-26 04:07 UTC (for 2 hours)
  • "Failed Payload Job" has been observed since 2020-06-20 22:47 UTC (for 7 hours) BIIDCO-2513 - Getting issue details... STATUS
  • "Aborted Pilot" has been observed since 2020-06-16 09:47 UTC (for 5 hours)
  • "Aborted Pilot" has been observed since 2020-03-19 13:24 UTC (for 1 hours) BIIDCO-2070 - Getting issue details... STATUS
  • Job status check: "Application finished with errors" on 2019/07/10 at 00:00 UTC (screenshot)
  • Health checker info. : "Short pilot jobs" has been found at 06:20:00 UTC on 2019/05/16. BIIDCO-1686 - Getting issue details... STATUS BIIDCO-1768 - Getting issue details... STATUS
  • Job status check: many "Application finished with errors" (overall 66% during past 24 hours) on 2019/05/15 at 7:00 UTC.
  • Job status check: many "Application finished with errors" on 2019/05/14 at 7:00 UTC.
  • Job status plots, 100% "Application Finished With Errors", 10:00:00 UTC on 2019/04/08. Still unchanged as of 2019/04/26.  BIIDCO-1823 - Getting issue details... STATUS

DIRAC.IITH.in

  • "Pilot Submission Failure" has been observed since 2020-03-02 01:05 UTC (for 5 hours).

  • "Pilot Submission Failure" has been observed since 2020-03-01 13:05 UTC (for 9 hours)

  • "Pilot Submission Failure" has been observed since 2020-02-18 03:59 UTC (for 19 hours) 

  • "Pilot Submission Failure" has been observed since 2020-02-17 15:59 UTC (for 7 hours) 

  • "Aborted pilot jobs" has been found at 22:20:00 UTC on 2019/06/03.(details)
  • Health checker info. : "Aborted pilot jobs" has been found at 22:20:00 UTC on 2019/06/02.(details)
  • Health checker info. : "Short pilot jobs" has been found at 07:20:00 UTC on 2019/03/29.(details BIIDCO-1768 - Getting issue details... STATUS

DIRAC.LMU.de

  • Not in use in MC production BIIDCO-26 - Getting issue details... STATUS
  • Banned for now.

DIRAC.MIPT.ru

  • Health checker info. : "Aborted pilot jobs" has been found since 13:20:00 UTC on 2019/04/20. BIIDCO-1816 - Getting issue details... STATUS

  • Health checker info. : "Aborted pilot jobs" has been found since 11:20:00 UTC on 2019/04/06 and since 05:20:00 UTC on 2019/04/12. and since 20:20:00 UTC on 2019/04/17.

DIRAC.Nagoya.jp

  • "Short Pilot" has been observed since 2020-06-06 21:47 UTC  BIIDCO-2467 - Getting issue details... STATUS
  • "Failed Payload Job" has been observed since 2020-01-04 23:34 UTC (for 7 hours)
  • "Short Pilot" has been observed since 2019-11-19 05:35 UTC (for 1 hours)
  • Health checker info. : "Short pilot jobs" has been found since 09:20:00 UTC on 2019/10/09.

DIRAC.Nara-WU.jp

  • "Pilot Submission Failure" has been observed since 2020-06-12 09:47 UTC (for 5 hours) BIIDCO-2476 - Getting issue details... STATUS
  • "Pilot Submission Failure" has been observed since 2020-06-11 12:47 UTC (for 10 hours) BIIDCO-2476 - Getting issue details... STATUS
  • "Pilot Submission Failure" has been observed since 2020-06-10 01:47 UTC (for 5 hours)
  • Under commissioning from 2018-11-13 BIIDCO-1432 - Getting issue details... STATUS

DIRAC.NDU.jp

  • "Short Pilot" has been observed since 2020-08-10 21:35 UTC (for 1 hours) BIIDCO-1368 - Getting issue details... STATUS
  • "Failed Payload Job" has been observed since 2020-06-28 12:07 UTC  BIIDCO-2531 - Getting issue details... STATUS

  • "Failed Payload Job" has been observed since 2020-06-21 04:47 UTC (for 1 hours)

DIRAC.Niigata.jp

  • "Failed Payload Job" has been observed since 2020-06-21 04:47 UTC (for 1 hours)
  • "Pilot Submission Failure" has been observed since 2020-06-15 14:47 UTC (for 1 hours) BIIDCO-2598 - Getting issue details... STATUS
  • "Failed Payload Job" has been observed since 2020-04-15 20:37 UTC   BIIDCO-2348 - Getting issue details... STATUS

DIRAC.Niigata2.jp

  • "Failed Payload Job" has been observed since 2020-07-25 20:15 UTC (for 3 hours) BIIDCO-2348 - Getting issue details... STATUS
  • "Failed Payload Job" has been observed since 2020-07-24 21:15 UTC (for 1 hours) BIIDCO-2348 - Getting issue details... STATUS
  • "Failed Payload Job" has been observed since 2020-07-23 18:15 UTC (for 4 hours) BIIDCO-2348 - Getting issue details... STATUS
  • "Failed Payload Job" has been observed since 2020-07-20 21:15 UTC (for 9 hours) BIIDCO-2348 - Getting issue details... STATUS
  • "Failed Payload Job" has been observed since 2020-04-02 20:30 UTC (for 3 hours) (details).  BIIDCO-2348 - Getting issue details... STATUS
  • "Application Finished with Errors" with 38.7% from 2020-04-21 10:00 UTC to 2020-04-22 02:00 UTC  BIIDCO-2381 - Getting issue details... STATUS

DIRAC.Osaka-CU.jp

  • Site is banned
  • Job submission check : Pilot submission failure has been found since 07:23:00 UTC on 2018/12/04. (details) BIIDCO-1434 - Getting issue details... STATUS
  • Health checker info. : "Short pilot jobs" has been found since 22:20:00 UTC on 2018/03/17.
    → Ask site admin to check the status 2018-03-17 10:00 JST. (DB access failure again from DIRAC.Osaka-CU.jp to PNNL from 2018-03-16 11:00 UTC)
    BIIDCO-290 - Getting issue details... STATUS

DIRAC.PAU.in

  • "Pilot Submission Failure" has been observed since 2020-01-22 20:53 UTC (for 2 hours)

DIRAC.PNNL.us

      • Site to be decommissioned BIIDCO-919 - Getting issue details... STATUS

DIRAC.PNNL2.us

      • Site to be decommissioned BIIDCO-920 - Getting issue details... STATUS

DIRAC.PNNL-CASCADE.us

      • Seeing no jobs (no plot) is not a problem

DIRAC.PNNL-PIC.us

      • Seeing no jobs (no plot) is not a problem

DIRAC.RCNP.jp

  • "Failed Payload Job" has been observed since 2020-06-28 12:07 UTC BIIDCO-2534 - Getting issue details... STATUS

DIRAC.LocalTest.jp

    •  

DIRAC.SSU.kr

  • "Short Pilot" has been observed since 2020-07-24 19:15 UTC (for 3 hours) BIIDCO-2548 - Getting issue details... STATUS
  • BIIDCO-2541 - Getting issue details... STATUS
  • Will be in Down-time  from July 4 (Friday) 12:00 till July 11 (Friday) 17:00 
  • "Short Pilot" has been observed since 2020-07-09 17:07 UTC (for 6 hours)
  • "Short Pilot" has been observed since 2020-07-06 04:07 UTC (for 4 hours)

  • "Short Pilot" has been observed since 2020-07-04 23:07 UTC (for 7 hours) BIIDCO-2548 - Getting issue details... STATUS
  • DIRAC.SSU.kr "Failed Payload Job" has been observed since 2020-06-08 02:47 UTC.  BIIDCO-2470 - Getting issue details... STATUS

DIRAC.TIFR.in

  • "Pilot Submission Failure" has been observed since 2020-08-10 07:35 UTC (for 15 hours) BIIDCO-2536 - Getting issue details... STATUS
  • "Pilot Submission Failure" has been observed since 2020-07-25 20:15 UTC (for 3 hours) BIIDCO-2536 - Getting issue details... STATUS
  • "Pilot Submission Failure" has been observed since 2020-07-24 21:15 UTC (for 1 hours) BIIDCO-2536 - Getting issue details... STATUS
  • "Pilot Submission Failure" has been observed since 2020-07-23 15:15 UTC (for 7 hours) BIIDCO-2536 - Getting issue details... STATUS
  • TIFR site is down due to hardware failure  BIIDCO-2536 - Getting issue details... STATUS
  • Health checker info. : "Short pilot jobs" has been found at 14:20:00 UTC on 2019/10/06.
  • 2018/07/06. (details) BIIDCO-1132 - Getting issue details... STATUS
  • Health checker info. : "Short pilot jobs" -- Already reported:  BIIDCO-971 - Getting issue details... STATUS
  •  RunningLimit is set for MCProduction=1 BIIDCO-1006 - Getting issue details... STATUS

DIRAC.TMU.jp

  • "Short Pilot" has been observed since 2020-08-10 19:35 UTC (for 3 hours) BIIDCO-1368 - Getting issue details... STATUS
  • "Failed Payload Job" has been observed since 2020-06-28 13:07 UTC BIIDCO-2533 - Getting issue details... STATUS

  • Health checker info. : "Short pilot jobs" has been found since 10:20:00 UTC on 2018/11/02 BIIDCO-1522 - Getting issue details... STATUS

DIRAC.Tokyo.jp

  • Decommissioned
  • Date, Issue, Tickets..

DIRAC.UAS.mx

  • "Short Pilot" has been observed since 2020-07-04 04:07 UTC (for 26 hours) BIIDCO-2549 - Getting issue details... STATUS
  • 2020-04-03 19:00 UTC  "Job Status Plot" shows 100% Job finished with errors
  • 2020-03-27 19:53 UTC  "Job Status Plot" shows 100% Job finished with errors
  • Health checker info. : "Belle II software could not be installed on " has been found since 15:20:00 UTC on 2019/04/25
  • Job submission check : Pilot submission failure has been found since 00:21:00 UTC on 2019/04/04. (details) BIIDCO-1772 - Getting issue details... STATUS
  • Health checker info. : "Belle II software could not be installed on " has been found since 01:20:00 UTC on 2019/02/20.
  • Job submission check: 100% failed with errors from 22:00 2019/01/08 till 04:00 2019/01/09 (UTC)
  • Health checker info. : "Belle II software could not be installed on " has been found since 04:20:00 UTC on 2018/12/17.  BIIDCO-1508 - Getting issue details... STATUS
  • Health checker info. : "Belle II software could not be installed on " has been found since 16:20:00 UTC on 2018/11/14.
  • Job submission check : Pilot submission failure has been found since 01:26:00 UTC on 2018/09/21. (details)

DIRAC.UVic.ca

  • User jobs enabled BIIDCO-2501 - Getting issue details... STATUS

DIRAC.UVic-local.ca

  • User jobs failed on the site: BIIDCO-1975 - Getting issue details... STATUS

DIRAC.Yamagata.jp

  • "Short Pilot" has been observed since 2020-07-06 04:07 UTC (for 4 hours)

  • "Failed Payload Job" has been observed since 2020-06-28 12:07 UTC BIIDCO-2512 - Getting issue details... STATUS

  • "Short Pilot" has been observed since 2020-06-20 23:47 UTC (for 6 hours) BIIDCO-2512 - Getting issue details... STATUS
  • Health checker info. : "Short pilot jobs" has been found at 22:20:00 UTC on 2019/03/13.(details) BIIDCO-1761 - Getting issue details... STATUS

DIRAC.Yonsei.kr

  • "Short Pilot" has been observed since 2020-08-10 21:35 UTC (for 1 hours) BIIDCO-1368 - Getting issue details... STATUS
  • "Short Pilot" has been observed since 2020-07-26 14:15 UTC (for 9 hours) BIIDCO-2584 - Getting issue details... STATUS
  • "Short Pilot" has been observed since 2020-07-09 17:07 UTC (for 5 hours)
  • "Short Pilot" has been observed since 2020-06-21 03:47 UTC (for 2 hours)
  • "Short Pilot" has been observed since 2020-05-10 05:43 UTC
  • "Short Pilot" has been observed since 2020-03-26 21:30 UTC (for 9 hours)
  • "Short Pilot" has been observed since 2020-01-05 05:34 UTC (for 1 hours)

DIRAC.LocalTest.jp

      • Date, Issue, Tickets..

LCG.CESNET.cz

  • "Short Pilot" has been observed since 2020-06-28 04:07 UTC

  • "Failed Payload Job" has been observed since 2020-06-14 20:47 UTC (for 2 hours) (details). BIIDCO-2486 - Getting issue details... STATUS
  • "Failed Pilot" has been observed since 2020-05-10 00:43 UTC
  • "Failed Payload Job" has been observed since 2020-05-10 01:43 UTC
  • "Failed Pilot" has been observed since 2020-05-07 11:43 UTC
  •   Need some intervention to run Merge jobs BIIDCO-771 - Getting issue details... STATUS

LCG.COSENZA.IT

      • "Short Pilot" has been observed since 2020-07-22 23:15 UTC (for 8 hours)
      • "Failed Payload Job" has been observed since 2020-06-02 10:47 UTC (for 4 hours)
      • "Failed Payload Job" has been observed since 2020-01-05 04:34 UTC (for 3 hours)
      • "Short Pilot" has been observed since 2019-11-22 04:35 UTC (for 3 hours)
      • "Failed Payload Job" has been observed since 2019-11-21 21:35 UTC (for 10 hours)
      • "Short Pilot" has been observed since 2019-11-11 10:30 UTC (for 4 hours)

LCG.CNAF.it 

      • Downtime 2020-07-20 14:00 - 2020-07-22 10:00 (UTC) BIIDCO-2472 - Getting issue details... STATUS

      • "Short Pilot" has been observed since 2020-07-02 01:07 UTC (for 5 hours) BIIDCO-2424 - Getting issue details... STATUS
      • "Short Pilot" has been observed since 2020-06-21 04:47 UTC (for 1 hours)
      •   MCProduction jobs restricted BIIDCO-2506 - Getting issue details... STATUS
      • Downtime 2020-06-08 06:00 - 2020-06-22 15:00 (UTC)  BIIDCO-2472 - Getting issue details... STATUS
      • "Short Pilot" has been observed since 2020-06-02 10:47 UTC (for 4 hours)
      • "Aborted Pilot" has been observed since 2020-05-21 17:44 UTC (for 5 hours)
      • "Short Pilot" has been observed since 2020-05-20 02:43 UTC (for 5 hours). BIIDCO-2424 - Getting issue details... STATUS
      • "Failed Payload Job" has been observed since 2020-01-04 21:34 UTC (for 10 hours)
      • Health checker info. : "Short pilot jobs" has been found since 11:20:00 UTC on 2019/10/03.(details)

LCG.CYFRONET.pl

  • "Failed Payload Job" has been observed since 2020-06-28 08:07 UTC BIIDCO-2532 - Getting issue details... STATUS

  • "Failed Pilot" has been observed since 2020-06-15 19:47 UTC (for 3 hours) (details).
  • Downtime 2020-05-22 08:00 (UTC)  - 2020-06-19 08:00 (UTC)  BIIDCO-2437 - Getting issue details... STATUS
  • Health checker info. : "Short pilot jobs" has been found since 13:20:00 UTC on 2018/12/13. BIIDCO-1246 - Getting issue details... STATUS

LCG.DESY.de

      • The site to be retired  BIIDCO-1240 - Getting issue details... STATUS  – No more jobs to be submitted.

LCG.Frascati.it

      • "BLAH Error" has been observed since 2020-08-10 22:34:18 UTC (for 5 hours) BIIDCO-2571 - Getting issue details... STATUS
      • "BLAH Error" has been observed since 2020-07-25 23:07:32 UTC (for 43 hours) BIIDCO-2571 - Getting issue details... STATUS
      • "BLAH Error" has been observed since 2020-07-23 23:07:30 UTC (for 18 hours) BIIDCO-2571 - Getting issue details... STATUS
      • "BLAH Error" has been observed since 2020-07-18 16:07:24 UTC (for 24 hours) BIIDCO-2571 - Getting issue details... STATUS
      • "Failed Pilot" has been observed since 2020-06-03 10:47 UTC (for 4 hours)
      • "Short Pilot" has been observed since 2020-06-02 12:47 UTC (for 2 hours)
      • "Failed Payload Job" has been observed since 2020-06-02 09:47 UTC (for 5 hours)
      •  Site is currently Banned due to hardware problem since 2019-07-05

      • Health checker info. : "BLAH ERROR" has been found since 15:20:00 UTC on 2019/05/21.(details)

LCG.HEPHY.at

      • Downtime 2020-08-05 07:00 to 2020-08-12 07:00 (UTC) BIIDCO-2578 - Getting issue details... STATUS
      • Downtime: 2020-07-21 07:00 (UTC) to 2020-08-05 07:00 (UTC) BIIDCO-2578 - Getting issue details... STATUS
      • HEPHY - Migration to the new site BIIDCO-2562 - Getting issue details... STATUS

      • "Short Pilot" has been observed since 2020-07-06 05:07 UTC (for 1 hours)
      • Downtime: 2020-06-03 12:00 (UTC) - 2020-06-10 12:00 (UTC) BIIDCO-2461 - Getting issue details... STATUS
      • "Failed Payload Job" has been observed since 2020-01-05 05:34 UTC (for 2 hours)
      • Health checker info. : "Failed pilot jobs" has been found at 13:20:00 UTC on 2019/10/03.(details)
      • Health checker info. : "Failed pilot jobs" has been found at 15:20:00 UTC on 2019/05/22.(details)
      • Health checker info. : "Short pilot jobs" has been found at 15:20:00 UTC on 2019/04/12.
      • Health checker info. : "Failed pilot jobs" has been found at 02:20:00 UTC on 2019/01/30.(details) and at 02:20:00 UTC on 2019/01/31.(details)
      • submission check : Pilot submission failure has been found at 14:22:00 UTC on 2018/12/27.

LCG.IPHC.fr

      • Downtime: 2020-07-21 07:00 (UTC) - 2020-07-28 07:00 (UTC)

LCG.KEK.jp

  •  'heavy' queue closed in preparation for the KEKCC renewal BIIDCO-2566 - Getting issue details... STATUS
  • Pilot submission failure happened since 2020-07-09 02:00 UTC BIIDCO-2558 - Getting issue details... STATUS
  • MCProduction restricted BIIDCO-2505 - Getting issue details... STATUS
  • no plot for raw data processing BIIDCO-2449 - Getting issue details... STATUS . update: this is expected actually, see BIIDCO-2400 - Getting issue details... STATUS
  • Job status: Large number of jobs finished with errors (61.0%) in last 24 hour period, from approx. 2020-01-01 00:00 - 02:00 UTC
  • SiteDirector "Failed to check the availability"  BIIDCO-1934 - Getting issue details... STATUS

LCG.KEK2.jp

  • Many jobs failed  BIIDCO-2435 - Getting issue details... STATUS  
  • Health checker info. : "Short pilot jobs" has been found at 16:20:00 UTC on 2019/10/09.

  • Still all jobs failing with InputDataResolution on 2019/07/25. BIIDCO-1542 - Getting issue details... STATUS
  • GGUS ticket : "KEK SE: PrepareToGet ETIMEDOUT for a specific file path"(140328) has been submited at 21:26:29 UTC on 2019/03/21.
  • Health checker info. : "Failed pilot jobs" has been found since 13:20:00 UTC on 2018/12/21. BIIDCO-1559 - Getting issue details... STATUS
  • all jobs are in "Input data resolution" status since 12.00 2018/12/18 UTC BIIDCO-1542 - Getting issue details... STATUS

LCG.KEK-merge.jp

      • Health checker info. : "Short pilot jobs" has been found at 00:20:00 UTC on 2019/08/26. BIIDCO-1978 - Getting issue details... STATUS
      • "Belle II software could not be installed on cb268.cc.kek.jp" has been found since 14:20:00 UTC on 2019/04/05
      • Health checker info. : "Short pilot jobs" has been found since 20:20:00 UTC on 2019/04/02
      •   being commissioned...

LCG.KISTI.kr 

      • -   KISTI-GSDC system in downtime BIIDCO-2556 - Getting issue details... STATUS
      • "BLAH Error" has been observed since 2020-07-06 23:07:01 UTC (for 12 hours)
      • BLAH error seems to be happen if jobs exceed the allocated # of queues, not a problem (Site specific feature)  
        BIIDCO-1259 - Getting issue details... STATUS

LCG.KIT.de

  • "Short Pilot" has been observed since 2020-08-10 16:35 UTC (for 6 hours) BIIDCO-1368 - Getting issue details... STATUS

LCG.KMI.jp

  • "Pilot Submission Failure" has been observed since 2020-07-25 22:15 UTC (for 1 hours) BIIDCO-2559 - Getting issue details... STATUS
  • "Pilot Submission Failure" has been observed since 2020-07-24 17:15 UTC (for 5 hours) BIIDCO-2559 - Getting issue details... STATUS
  • "Pilot Submission Failure" has been observed since 2020-07-23 20:15 UTC (for 2 hours) BIIDCO-2559 - Getting issue details... STATUS
  • "Short Pilot" has been observed since 2020-07-11 21:08 UTC (for 5 hour) BIIDCO-2560 - Getting issue details... STATUS
  • "Pilot Submission Failure" has been observed since 2020-07-10 04:08 UTC BIIDCO-2559 - Getting issue details... STATUS
  • "Not Enough Disk Space" for "ncream01.hepl.phys.nagoya-u.ac.jp" has been observed since 2020-06-26 06:06:53 UTC (for 1 hours) BIIDCO-2521 - Getting issue details... STATUS
  • "Pilot Submission Failure" has been observed since 2020-06-17 18:47 UTC
  • "Pilot Submission Failure" has been observed since 2020-06-17 12:47 UTC (for 2 hours)
  • "Aborted Pilot" has been observed since 2020-05-03 02:37 UTC (for 4 hours) BIIDCO-2395 - Getting issue details... STATUS
  • "Failed Payload Job" has been observed since 2020-01-04 22:34 UTC (for 9 hours)
  • "Failed Payload Job" has been observed since 2019-11-19 12:35 UTC (for 2 hours)
  • Health checker info. : "Belle II software could not be installed on pwn22.local" has been found since 21:20:00 UTC on 2018/11/22.
  • Job submission check : Pilot submission failure has been found since 21:24:00 UTC on 2018/10/02. (details)

LCG.LAL.fr

      • "Pilot Submission Failure" has been observed since 2020-06-01 12:51 UTC (for 2 hours)

LCG.Legnaro.it

      • "Failed Payload Job" has been observed since 2020-06-02 12:47 UTC (for 2 hours)

LCG.Napoli.it

  • "Failed Payload Job" has been observed since 2020-07-20 11:15 UTC (for 19 hours) BIIDCO-2573 - Getting issue details... STATUS
  • "Pilot Submission Failure" has been observed since 2020-06-20 04:47 UTC (for 1 hours)
  • "Pilot Submission Failure" has been observed since 2020-03-16 22:24 UTC (for 1 hours) (details).
  • "Failed Payload Job" has been observed since 2020-03-05 01:05 UTC (for 5 hours)
  • "Failed Payload Job" has been observed since 2020-01-04 20:34 UTC (for 11 hours)
  • "Failed Payload Job" has been observed since 2020-01-04 02:34 UTC (for 4 hours)
  •  t2-recas-ce01.na.infn.it shows pilot submission error and this CE should  be banned till 2019 September.

  • Stalled jobs BIIDCO-1255 - Getting issue details... STATUS

LCG.NTU.tw

  • "Failed Payload Job" has been observed since 2020-08-10 16:35 UTC (for 6 hours) BIIDCO-617 - Getting issue details... STATUS
  • "CRL has expired" for "node39-0,node37-0,node38-0" has been observed since 2020-07-20 07:07:26 UTC (for 16 hours) BIIDCO-2499 - Getting issue details... STATUS
  • Downtime start time 2020-07-24 17:00 - end time 2020-07-27 12:00 (UTC) BIIDCO-2519 - Getting issue details... STATUS   
  • "CRL has expired" for "node39-0" has been observed since 2020-06-17 14:46:46 UTC (for 4 hours) BIIDCO-2499 - Getting issue details... STATUS
  • "Belle II software could not be installed" for "belle2grid3.cc.ntu.edu.tw" has been observed since 2020-06-28 06:06:54 UTC (for 1 hours) BIIDCO-2527 - Getting issue details... STATUS
    Solved and verified : GGUS ticket https://ggus.eu/?mode=ticket_info&ticket_id=147803 has bee submitted at 2020-07-12 06:50
  • "CRL has expired" for "node39-0" has been observed since 2020-06-17 14:46:46 UTC (for 4 hours) BIIDCO-2499 - Getting issue details... STATUS
  • "Failed Payload Job" has been observed since 2020-06-02 11:47 UTC (for 3 hours)
  • "Short Pilot" has been observed since 2020-05-18 11:43 UTC (for 2 hours) 
  • "Short Pilot" has been observed since 2020-03-28 14:30 UTC (for 1 hours)

LCG.Pisa.it

LCG.Roma3.it

  • Health checker info. : "Failed pilot jobs" has been found at 14:20:00 UTC on 2019/04/20. BIIDCO-1538 - Getting issue details... STATUS

LCG.TAU.il

  • "Failed Pilot" has been observed since 2020-06-14 21:47 UTC (for 1 hours) (details).

  • Health checker info. : "Failed pilot jobs" has been found since 19:20:00 UTC on 2019/05/24.(details)

LCG.Torino.it

  • "Failed Payload Job" has been observed since 2020-07-17 22:14 UTC (for 18 hours) BIIDCO-2569 - Getting issue details... STATUS

            GGUS: https://ggus.eu/index.php?mode=ticket_info&ticket_id=147926 has been submitted.

  • "Pilot Submission Failure" has been observed since 2020-04-21 03:37 UTC (for 3 hours) (details).
    GGUS: https://ggus.eu/?mode=ticket_info&ticket_id=146605 has been submitted.
  • "BLAH Error" has been observed since 2020-02-09 05:53:05 UTC (for 9 hours) BIIDCO-2279 - Getting issue details... STATUS
  • "BLAH Error" has been observed since 2020-02-08 14:53:04 UTC (for 1 hours)
  • "Pilot Submission Failure" has been observed since 2019-12-28 18:34 UTC  BIIDCO-2215 - Getting issue details... STATUS

LCG.ULAKBIM.tr

  • The queue 'belle7' to be disabled. use only 'belle' BIIDCO-1896 - Getting issue details... STATUS
  • Health checker info. : "Aborted pilot jobs" has been found since 01:20:00 UTC on 2019/08/01.

OSG.BNL.us

  • "Short Pilot" has been observed since 2020-06-15 11:47 UTC (for 4 hours)
  • "Failed Payload Job" has been observed since 2020-06-02 13:47 UTC 
  • Failed Payload Job" has been observed since 2020-01-04 20:34 UTC (for 11 hours)
  • "Failed Payload Job" has been observed since 2019-12-20 14:28 UTC  BIIDCO-2195 - Getting issue details... STATUS
    Solved and verified GGUS ticket : https://ggus.eu/index.php?mode=ticket_info&ticket_id=144665 has  been submitted
  • "Pilot Submission Failure" has been observed since 2019-12-05 05:28 UTC 
  • Health checker info. : "Belle II software could not be installed on " has been found since 19:20:00 UTC on 2019/02/14.
  • Job submission check: Jobs fail with errors or input data resolution the last 24h (6:00 UTC, 2019/01/09)  BIIDCO-1596 - Getting issue details... STATUS
  • Production jobs: UNAVAILABLE files BIIDCO-1302 - Getting issue details... STATUS
  • Number of concurrent MCProduction jobs restricted BIIDCO-1256 - Getting issue details... STATUS
  •  MCProduction jobs are mostly stalled BIIDCO-1253 - Getting issue details... STATUS

OSG.CORI.us

      • OSG.CORI.us resource has been removed because CY18 allocation was not approved

OSG.UMiss.us

  • "Failed Payload Job" has been observed since 2020-06-02 09:47 UTC (for 5 hours)
  • "Pilot Submission Failure" has been observed since 2020-05-09 04:43 UTC
  • "Pilot Submission Failure" has been observed since 2020-05-08 09:43 UTC (for 3 hours) 
  • Health checker info. : "Aborted pilot jobs" has been found at 22:20:00 UTC on 2019/06/02. BIIDCO-1856 - Getting issue details... STATUS
  • Health checker info. : "Short pilot jobs" has been found at 14:20:00 UTC on 2019/05/20.(details)
  • Health checker info. : "Short pilot jobs" has been found at 07:20:00 UTC on 2019/05/14.(details)
  • Health checker info. : "Short pilot jobs" has been found since 22:20:00 UTC on 2019/05/12.(details)
    Updated BIIDCO-1768 - Getting issue details... STATUS
  • Job submission check : Pilot submission failure has been found since 12:27:00 UTC on 2019/05/04. (details)
  • Health checker info. : "Short pilot jobs" has been found since 07:20:00 UTC on 2019/05/11.(details)
  • Health checker info. : "Short pilot jobs" has been found since 14:20:00 UTC on 2019/04/11 and  at 17:20:00 UTC on 2019/04/14.
  • Job status check: 34.7% appl. finshed with errors on 2019/04/08.

SSH.KMI.jp

  • Job status plot: input data resolution problems (for 7 hours) since 2019-12-24 00:00 UTC, approximately.
  • "Short Pilot" has been observed since 2019-12-24 05:28 UTC (for 1 hours)
  • Job status check: Application finished with errors (12% of the jobs in last 24 hours) on 2018/12/22 at 11:30 UTC.
  • Health checker info. : "Short pilot jobs" has been found at 20:20:00 UTC on 2018/08/13.

Test.KIT.de

      • Downtime 2020-06-16 07:00 - 2020-06-16 10:00 BIIDCO-2490 - Getting issue details... STATUS

      • Test site for the opportunistic resources at KIT. No need to report problems.

Test.ULAKBIM.tr

      • Test site for the SL7 resources at ULAKBIM. No need to report problems.
      • No activities expected currently.

VCYCLE.LAL.fr

      • Under commissioning ( BIIDCO-2430 - Getting issue details... STATUS )

VCYCLE.Napoli.it

      • "Failed Payload Job" has been observed since 2020-01-05 04:34 UTC (for 3 hours)
      • "Failed Payload Job" has been observed since 2019-11-21 20:35 UTC (for 12 hours)
      • Opportunistic site (Empty plot is not a problem)
      •  Ban lifted BIIDCO-1613 - Getting issue details... STATUS
      • "Sudo CE Error: sudo execution fails with return code 1" BIIDCO-1612 - Getting issue details... STATUS

VCYCLE.HNSC01.it, VCYCLE.HNSC02.it

      • Opportunistic site (Empty plot is not a problem)

Links


  • No labels