Contents

  • Individual Servces and Resources

     Click here to expand...

 

Production Plans

  • MC8 started on February 17, 2017
    • Phase III generic samples
    • Phase III signal samples
    • Phase II signal samples
    • Other?

 

  • MC7 ended February 8, 2017
  • MC7 started on November 1, 2016
    • Phase III signal MC samples with 12th background MC campaign samples
    • Phase III Y(6S) production with 12th background MC campaign samples
    • Phase II Y(6S) production with new, phase II background samples
    • Phase II Y(4S) production with new, phase II background samples
    • Phase III Y(3S) production with 12th background MC campaign samples
    • Additional generic samples at phase III with and without backgrounds
  • MC7 phase 2 started January 9, 2017
    • Phase III Y(4S) generic samples
    • Phase III Y(4S) ccbar sample with new parameters
  • MC8 test production started February 1, 2017
  • In parallel with MC7 production, ~10,000 analyisis test jobs were submitted on Feb. 2nd. These jobs will finish within next couple of hours from now (as of 21:00 JST)

Production Status

MC8 

Official production started at 01:30 JST on Feb 17 with a 1 ab-1 sample of phase 3 - Y(4S) generic MC

Submitted ~72 signal samples for a total of about 162 million events (Feb 25, ~02:00 JST)

MC7

Official production started as scheduled at 00:00 JST on Nov. 1.

A full list of samples is given on the data production page

A total of 100,000 jobs of 10k events each have been submitted as of Nov. 4

The following productions have been stopped and the corresponding jobs killed due to improper ROOT output (Nov. 10):

  • 517 - nonbsbs phase 2 Y(6S)
  • 518 - bsbs phase 2 Y(6S)

A total of 139,972 jobs of 10k events have been submitted as of Nov. 10

Phase 2 background samples have been distributed. 

  • Distribution to the Grid sites:
    Error rendering macro 'jira' : java.lang.NullPointerException
  • Distribution to the non-Grid sites: BIIDCO-32 - JIRA project doesn't exist or you don't have permission to view it.
  • When they are ready, we will finish production of phase 2 samples. This will complete the official requests for MC7. Thereafter, we will submit additional generic samples and perhaps include some additional requests from the physics group.

Generic phase 2 samples at Y(6S) and Y(4S) as well as some signal samples with backgrounds have been submitted (~23:50 JST on Nov. 11 and ~04:50 JST on Nov. 12). In total 17,776 Y(6S) + 29,608 Y(4S) = 47,384 jobs were submitted.

All requested MC samples for MC7 have been submitted (Nov 14)

Also submitted Phase III Y(3S) requests, which total 23,850 jobs of 10k events (Nov 14 at 04:30 JST). ((Running total is now 215,332 jobs))

Completed database access tests with 2k and 5k concurrent jobs (Nov 16 at ~10:00 JST)

Submitted Phase III Y(4S) generic samples (1 ab-1), which total 575,300 jobs of 10k events (Nov 16 at ~23:40 JST)

Submitted additional jobs including another 1 ab-1 of phase 3 generic Y(4S) samples, which total 636,000 jobs of about 0.5/0.5 of 10k/5k events. This is roughly equal to the number of jobs previously submitted, bringing the total number to over 1.2 million jobs. (December 8 at ~00:00 JST)

Submitted additional generic samples (1 ab-1 of mixed and charged events), totalling 160,400 additional jobs. (December 12 at ~04:00 JST)

Submitted additional signal samples (12.7k jobs) and a new ccbar sample with modified parameters (159.4k jobs). (December 21 at ~02:00 JST)

The system keeps working with the jobs already submitted until they are exhausted. No new submission of additional samples until January 9, after the downtime of KEK

 

New submission started January 9 at 0:00 JST with phase 3 - Y(4S) generic samples:

  • mixed (BGx1: 42770 jobs, BGx0: 21380) ((~0:00 JST))
  • charged (BGx1: 45230 jobs, BGx0: 22620) ((~6:00 JST))
  • uubar, ddbar, ssbar, ccbar, taupair: 538,060 jobs ((~01:30 JST))

Submitted new ccbar samples with new parameters: 159,480 jobs (~22:15 JST on Jan 27)

Test production for MC8, including 10 jobs of 10k events each with BG for generic samples (70 jobs total) started Feb. 1.

Submitted additional MC7 signal samples: 11,200 jobs (~02:00 JST on Feb 5)

 

Central Services

Dirac

  • Down for an upgrade work

Monitor

SEs

SE Common Issues

  • All sites showing an increase of "Scheduled" jobs much larger than "Done" (link to logbook posts 1, 2, 3, 4, 5, direct link to plot) experts have been notified: UTC 2017-02-26 13:30, UTC 2017-02-27 03:00,  UTC 2017-02-27 11:20
  • No transfers are being scheduled while upgrade of central dirac services is ongoing.

Destination SE: KEK2-TMP-SE (kek2-se01.cc.kek.jp)

  • 2017-02-21 12:00 JST: Replication_status shows "KEK2-TMP-SE:BANNED", while there are replication activities.
    • KEK2-TMP-SE is not banned for write, but so only for removal. The monitoring would need a fix for this case.

  • Reactivated as a destination SE  
  • Still banned for removal due to the issue in the back-end HSM BIIDCO-41 - JIRA project doesn't exist or you don't have permission to view it.

Destination SE: PNNL-TMP-SE (se.hep.pnnl.gov)

  • SE Health check by DDM : remove file, download, upload do not work since 2017-02-27 10:02:52 UTC.
  • PNNL-TMP-SE BIIDCO-109 - JIRA project doesn't exist or you don't have permission to view it.  is banned following power incident at the site. Ban done at 2017-02-17 16:43:22 UTC
    • BIIDCO-113 - JIRA project doesn't exist or you don't have permission to view it.
  • SE Health check by DDM : checksum, remove file, remove directory, download, upload, ls do not work since 2017-02-17 13:36:06 UTC
  • SE Health check by DDM : checksum, remove file, remove directory, download, upload, ls do not work since 2017-02-03 07:57:41 UTC. - reported to experts
  • SE Health check by DDM : remove file, remove directory, download, ls do not work since 2017-01-25 03:13:46 UTC. - experts notified
  • SE was unhealthy between 2017-01-11 01:00 UTC and   2017-01-11 03:00 UTC.
  • SE Health check by DDM : remove file, remove directory, download, upload do not work since 2017-01-11 03:41:11 UTC.

Destination SE: DESY-TMP-SE (dcache-se-desy.desy.de)

  • Zero replication operations noted on 2017-02-25 at ~0040 JST. (direct link to plot, link to logbook entry).
  • an issue by the automatic issue detector (notified comp-dc-operations@belle2.org):
    SE Health check by DDM : remove file, remove directory, download, upload do not work since 2017-01-16 17:57:17 UTC.
  • "All pools are full" – The SE is banned for write  BIIDCO-107 - JIRA project doesn't exist or you don't have permission to view it.
    • ~ 9 TB freed by now  
    • unbanned for now.   - Keep appear on issue detector after unbanned, just remind to experts (2017/01/27 10:00 JST)
  • "To redirect current pending transfers to other sites" BIIDCO-108
    • The number of Replication operations have increased because of this
  • SE Health check by DDM : remove file, remove directory, download, upload do not work since 2017-01-16 17:57:17 UTC.

Destination SE: CNAF-TMP-SE (storm-fe-archive.cr.cnaf.infn.it)

Destination SE: KMI-TMP-SE (nsrmfe01.hepl.phys.nagoya-u.ac.jp)

Destination SE: KIT-TMP-SE (gridka-dcache.fzk.de)

Destination SE: Napoli-TMP-SE (belle-dpm-01.na.infn.it)

  • SE in read only for maintainace up to 24/02/2017 17:00 UTC
  • SE Health check by DDM : remove file, download, upload do not work since 2017-02-23 09:56:28 UTC - notified to comp-dc-operations@belle2.org (at 11.40 UTC)
  •  SE Health check by DDM : remove file, download, upload do not work since 2017-02-16 11:34:51 UTC - reported to comp-dc-operations@belle2.org
  • SE Health check by DDM : remove file, download, upload do not work since 2017-02-23 09:56:28 UTC - notified to comp-dc-operations@belle2.org

Destination SE: CESNET-TMP-SE (dpm1.egee.cesnet.cz)

  • 2017-02-21 0830JST:
  • an issue by the automatic issue detector (notified comp-dc-operations@belle2.org):
    SE Health check by DDM : remove file, download, upload do not work since 2017-01-25 07:07:04 UTC.

Destination SE: SIGNET-TMP-SE (dcache.ijs.si)

  • Destination SE share for SIGNET-TMP-SE has been reduced from 15 to 5. See Computing DestinationSE for more details.

  • SE Health check by DDM : checksum, remove file, remove directory, download, upload, ls do not work since 2017-02-06 05:04:46 UTC. - reported to comp-dc-operations@belle2.org

Other SEs

CYFRONET-TMP-SE (dpm.cyf-kr.edu.pl)

  • 2017-02-21 0830JST:
  • Solved and verified : GGUS Ticket #125723 CYFRONET SE: dpm.cyf-kr.edu.pl not accessible

McGill SE  (storm02.clumeq.mcgill.ca)

Pisa-TMP-SE (stormfe1.pi.infn.it)

Torino-TMP-SE (se-srm-00.to.infn.it)

HEPHY-TMP-SE (hephyse.oeaw.ac.at)

  • Errors seen in FTS dashboard around 2017-01-29 15:00 UTC. Reported to experts via mailing list. Please report to the mailing list again if error rate increases.

ULAKBIM-TMP-SE (torik1.ulakbim.gov.tr)

Sites

Sites Common Issues

  •  

ARC.DESY.de

  • Health checker info. : "Short pilot jobs" has been found at 04:20:00 UTC on 2017/02/26
  • Health checker info. : "Short pilot jobs" has been found at 02:20:00 UTC on 2017/02/26
  • Health checker info. : "Short pilot jobs" has been found at 06:20:00 UTC on 2017/02/22.
  • Job submission check : Pilot submission failure has been found at 05:27:00 UTC on 2017/02/22.
  • Health checker info. : "Short pilot jobs" has been found at 22:20:00 UTC on 2017/02/19. Notifiedcomp-dc-operations@belle2.org.
  • Health checker info. : "Short pilot jobs" has been found since 04:20:00 UTC on 2017/02/18. Notified comp-dc-operations@belle2.org 18/02/2017 ~16.00 JST
  • Health checker info. : "Short pilot jobs" has been found at 01:20:00 UTC on 2017/02/18
  • Health checker info. : "Short pilot jobs" has been found at 23:20:00 UTC on 2017/02/17.(details)
  • Health checker info. : "Short pilot jobs" has been found since 18:20:00 UTC on 2017/02/14.(details) Notified  comp-dc-operations@belle2.org 16/02/2017 ~8:00 JST
  • Health checker info. : "Failed pilot jobs" has been found since 04:20:00 UTC on 2017/02/06.(details)
  • Health checker info. : "Failed pilot jobs" has been found since 01:20:00 UTC on 2017/02/06.(details)
  • Health checker info. : "Failed pilot jobs" has been found since 21:20:00 UTC on 2017/02/05.(details)
  • Health checker info. : "Failed pilot jobs" has been found since 16:20:00 UTC on 2017/02/04.(details)
  • Health checker info. : "Failed pilot jobs" has been found at 14:20:00 UTC on 2017/02/04. - Reported to experts
  • Health checker info. : "Failed pilot jobs" has been found at 10:20:00 UTC on 2017/02/04.(details)
  • Health checker info. : "Failed pilot jobs" has been found since 04:20:00 UTC on 2017/02/04.(details)
  • Health checker info. : "Failed pilot jobs" has been found at 10:20:00 UTC on 2017/02/03.(details)
  • Health checker info. : "Short pilot jobs" has been found at 06:20:00 UTC on 2016/12/21.
  • Health checker info. : "Aborted pilot jobs" has been found at 22:20:00 UTC on 2016/12/19.
  • Health checker info. : "Aborted pilot jobs" has been found at 18:20:00 UTC on 2016/12/18
  • Health checker info. : "Aborted pilot jobs" has been found since 03:20:00 UTC on 2016/12/18.(details)
     
  • Health checker info. : "Aborted pilot jobs" has been found at 20:20:00 UTC on 2016/12/17.
  • Health checker info. : "Aborted pilot jobs" has been found at 15:20:00 UTC on 2016/11/27.(details)
  • Solved and verified: job submission to grid-arcce[0-1].desy.de failed (https://ggus.eu/index.php?mode=ticket_info&ticket_id=124740)
  • Job submission check : Pilot submission failure has been found since 13:25:00 UTC on 2016/12/03

ARC.KIT.de

  • Job submission check : Pilot submission failure has been found since 04:25:00 UTC on 2017/02/24. (details) - Notified experts.
  • Health checker info. : "Short pilot jobs" has been found at 10:20:00 UTC on 2017/02/22
  • Health checker info. : "Short pilot jobs" has been found at 06:20:00 UTC on 2017/02/22.
  • Health checker info. : "Short pilot jobs" has been found at 21:20:00 UTC on 2017/02/21. Notified comp-dc-operations@belle2.org 2017-02-22 ~0800 JST
  • Health checker info. : "Short pilot jobs" has been found since 21:20:00 UTC on 2017/02/20.(details) – notified experts 2017-02-20 ~0800JST
  • Health checker info. : "Short pilot jobs" has been found since 17:20:00 UTC on 2017/02/20.(details)

  • Job submission check : Pilot submission failure has been found at 14:28:00 UTC on 2017/02/20. (details)
  • Health checker info. : "Short pilot jobs" has been found at 14:20:00 UTC on 2017/02/19. Notified comp-dc-operations@belle2.org.
  • Health checker info. : "Short pilot jobs" has been found since 05:20:00 UTC on 2017/02/18. Notified comp-dc-operations@belle2.org 18/02/2017 ~16.00 JST
  • Health checker info. : "Short pilot jobs" has been found at 02:20:00 UTC on 2017/02/18
  • Health checker info. : "Short pilot jobs" has been found at 00:20:00 UTC on 2017/02/18
  • Health checker info. : "Short pilot jobs" has been found since 21:20:00 UTC on 2017/02/17.(details)
  • Health checker info. : "Belle II software could not be installed on " has been found since 05:20:00 UTC on 2017/02/09.
  • Job submission check : Pilot submission failure has been found since 20:26:00 UTC on 2017/02/23. (details)

ARC.LMU.de

  • This is a test site. Do not need to report any issue.

ARC.LMU2.de

  • Health checker info. : "Aborted pilot jobs" has been found since 05:20:00 UTC on 2017/02/26.
  • Health checker info. : "Aborted pilot jobs" has been found at 06:20:00 UTC on 2017/02/27.

ARC.MPPMU.de

  • Health checker info. : "Short pilot jobs" has been found at 22:20:00 UTC on 2017/01/31.(details)
  • Stalled jobs at ARC.MPPMU.de, reported to comp-dc-operations@belle2.org Jobs by Final Minor Status
  • Health checker info. : "Aborted pilot jobs" has been found since 15:20:00 UTC on 2017/01/29.(details) - experts notified
  • Health checker info. : "Aborted pilot jobs" has been found since 23:20:00 UTC on 2017/01/28.(details)
  • Health checker info. : "Aborted pilot jobs" has been found at 18:20:00 UTC on 2017/01/27.(details)
  • Health checker info. : "Aborted pilot jobs" has been found since 15:20:00 UTC on 2017/01/27.(details)
  • Health checker info. : "Aborted pilot jobs" has been found since 12:20:00 UTC on 2017/01/27.(details)
  • Health checker info. : "Aborted pilot jobs" has been found at 08:20:00 UTC on 2017/01/27.(details)
  • Health checker info. : "Aborted pilot jobs" has been found since 23:20:00 UTC on 2017/01/26.(details)
  • Health checker info. : "Aborted pilot jobs" has been found since 14:20:00 UTC on 2017/01/26.(details)
  • Health checker info. : "Aborted pilot jobs" has been found at 06:20:00 UTC on 2017/01/26.(details)
  • Health checker info. : "Aborted pilot jobs" has been found since 13:20:00 UTC on 2017/01/25.(details)

  • Health checker info. : "Aborted pilot jobs" has been found since 18:20:00 UTC on 2017/01/24.(details) - expert notified
  • Health checker info. : "Aborted pilot jobs" has been found at 10:20:00 UTC on 2017/01/22.(details)
  • Health checker info. : "Aborted pilot jobs" has been found since 23:20:00 UTC on 2017/01/21.(details
  • significant  "Stalled" and "Finished w Error"  Job Status Plots  (2017-01-18 16:00h to  2017-01-19 16:00h)   
  • solved and verified   GGUS ticket submitted on the disk space issue on 2016/11/21 00:00 UTC https://www.ggus.org/?mode=ticket_info&ticket_id=125097 .
  • Health checker info. : "Aborted pilot jobs" has been found since 22:20:00 UTC on 2016/11/12.(details)
  • Health checker info. : "Failed pilot jobs" has been found since 22:20:00 UTC on 2016/11/15.(details)
  • Job submission check : Pilot submission failure has been found since 01:27:00 UTC on 2016/11/27. (details)

  • Job submission check : Pilot submission failure has been found at 05:23:00 UTC on 2016/11/28. (details)

ARC.SIGNET.si

  • Health checker info. : "Failed pilot jobs" has been found at 06:20:00 UTC on 2017/02/20.
  • Job submission check : Pilot submission failure has been found at 14:23:00 UTC on 2017/02/07. (details)
  • Health checker info. : "Short pilot jobs" has been found at 22:20:00 UTC on 2016/12/20.
  • Health checker info. : "Failed pilot jobs" has been found at 06:20:00 UTC on 2016/11/28.(details)
  • Health checker info. : "Failed pilot jobs" has been found at 13:20:00 UTC on 2017/01/09.(details)
  • Job submission check : Pilot submission failure has been found since 03:23:00 UTC on 2017/02/07.
      -Notified experts

CLOUD.CC1_Krakow.pl

  •  8. Job status plots:  No data for this selection for the plot: Running jobs by FinalMinorStatus

DIRAC.Beihang.cn

  • Banned.
  • Upload to CNAF SE is problematic (packets cannot reach the site).
    • Solved: ggus:124942
    • CNAF-TMP-SE added to OutputSE for verification of the solution.
  • Short pilot jobs at 16:20:00 UTC on 2016/11/05.
  • Large % of failed jobs in DIRAC status plot (Added 2016-11-03 22:45:00 UTC) 
    • BIIDCO-38 - JIRA project doesn't exist or you don't have permission to view it.
  • Health checker info. : "Short pilot jobs" has been found at 14:20:00 UTC on 2016/11/12.(details)
    • reported to comp-dc-operations@belle2.org 2016/11/13 00:40:00 JST
    • Suddenly, the site cannot access KEK dirac servers. This produces many short pilot jobs and "input data resolution" jobs. (15/Nov/2016)
      Already informed the site maintainer.
  • All the upload trials are failing against all the SEs configured: OutputSE (KMI-TMP-SE, PNNL-TMP-SE), Fail-over SEs(DESY-TMP-SE, Napoli-TMP-SE, PNNL-TMP-SE, KIT-TMP-SE)
    • Banned for now. BIIDCO-43 - JIRA project doesn't exist or you don't have permission to view it.

DIRAC.BINP.ru

  • Job submission check : Pilot submission failure has been found since 04:27:00 UTC on 2017/02/05. (details) - experts notified
     
  • Health checker info. : "Short pilot jobs" has been found at 20:20:00 UTC on 2016/12/18.
  •  Health checker info. : "Short pilot jobs" has been found at 00:20:00 UTC on 2016/11/01
  • Job submission check : Pilot submission failure has been found at 04:27:00 UTC on 2017/02/05. (details)

DIRAC.CINVESTAV.mx

  • Job submission check : Pilot submission failure has been found since 01:25:00 UTC on 2017/02/25. (details) BIIDCD-399 - JIRA project doesn't exist or you don't have permission to view it.
  • Unexpected downtime at 16:10:00 (UTC) on 2017/02/23.  BIIDCO-121 - JIRA project doesn't exist or you don't have permission to view it.
  • Job submission check : Pilot submission failure has been found at 14:23:00 UTC on 2017/01/27. (details)
  • Job submission check : Pilot submission failure has been found at 12:22:00 UTC on 2017/01/27. (details)
  • Job submission check : Pilot submission failure has been found at 16:22:00 UTC on 2017/01/26. (details)
  • Downtime during 01/24 22:00 - 01/27 23:00 (UTC)    BIIDCO-110 - JIRA project doesn't exist or you don't have permission to view it.

DIRAC.DESY.de

  • Test site. Not in use in MC production

DIRAC.IITG.in

DIRAC.LMU.de

  • Not in use in MC production BIIDCO-26 - JIRA project doesn't exist or you don't have permission to view it.
  • Banned for now.

DIRAC.MIPT.ru

  • Job submission check : Pilot submission failure has been found since 20:21:00 UTC on 2017/02/14. (details) Notified  comp-dc-operations@belle2.org 16/02/2017 ~8:00 JST
  • Health checker info. : "Aborted pilot jobs" has been found at 23:20:00 UTC on 2017/01/15.(details)
  • Configured to read beam BG file from local at 2016 11:30 (UTC).
  • Unbanned at 11/26 10:30 (UTC).
  • Banned again on 11/25 4:05 (UTC).
  • Unbanned at 11/25 2:43 (UTC).
  • Banned for storage maintenance on 11/21 15:00 (UTC) .
  • Health checker info. : "Aborted pilot jobs" has been found since 14:20:00 UTC on 2016/11/20.(details)
  • Health checker info. : "Aborted pilot jobs" has been found since 15:20:00 UTC on 2016/11/16.(details)  These aborted pilots jobs disappeared a few hours later.
    • BIIDCO-46 - JIRA project doesn't exist or you don't have permission to view it.

DIRAC.Nagoya.jp

  • Health checker info. : "Short pilot jobs" has been found at 06:20:00 UTC on 2016/12/21.
  • Un-banned on 11/21 13:00 (UTC)
  • banned on 11/19 00:00 (UTC)

DIRAC.Nara-WU.jp

  •  Decommissioned site: Since this still uses SL5, DIRAC pilot cannot be executed there.

DIRAC.NDU.jp

  • Health checker info. : "Short pilot jobs" has been found at 06:20:00 UTC on 2016/12/21.

DIRAC.Niigata.jp

  • Job submission check : Pilot submission failure has been found since 09:24:00 UTC on 2017/01/25. (details)
  • Job submission check : Pilot submission failure has been found since 06:24:00 UTC on 2017/01/23. (details) - experts notified

DIRAC.Osaka-CU.jp

  • Health checker info. : "Short pilot jobs" has been found since 01:20:00 UTC on 2016/12/21.
  • Health checker info. : "Short pilot jobs" has been found at 22:20:00 UTC on 2016/12/20.
  • Health checker info. : "Short pilot jobs" has been found at 04:20:00 UTC on 2016/12/18.(details
  • Login to head node from Nagoya DIRAC is failed. Notified to site admin.

DIRAC.PNNL-CASCADE.us

  • Seeing no jobs (no plot) is not a problem

DIRAC.PNNL-PIC.us

  • Seeing no jobs (no plot) is not a problem

DIRAC.PNNL.us

  • BIIDCO-113 - JIRA project doesn't exist or you don't have permission to view it.
  • Job submission check : Pilot submission failure has been found since 20:26:00 UTC on 2017/01/30. (details)
  • Health checker info. : "Aborted pilot jobs" has been found since 04:20:00 UTC on 2016/11/23.(details

DIRAC.PNNL2.us

  • BIIDCO-113 - JIRA project doesn't exist or you don't have permission to view it.
  • Health checker info. : "Short pilot jobs" has been found at 06:20:00 UTC on 2016/12/21.
  • Health checker info. : "Short pilot jobs" has been found at 04:20:00 UTC on 2016/12/21.
  • Health checker info. : "Aborted pilot jobs" has been found since 15:20:00 UTC on 2016/12/04.(details)
  • Health checker info. : "Aborted pilot jobs" has been found since 08:20:00 UTC on 2016/12/03.(details)
  • BIIDCO-65 - JIRA project doesn't exist or you don't have permission to view it.
  • Health checker info. : "Aborted pilot jobs" has been found since 02:20:00 UTC on 2016/11/30.(details)
  • Health checker info. : "Aborted pilot jobs" has been found since 08:20:00 UTC on 2016/11/29.(details)
  • Health checker info. : "Aborted pilot jobs" has been found since 06:20:00 UTC on 2016/12/01.(details)

DIRAC.RCNP.jp

  • Health checker info. : "Short pilot jobs" has been found since 14:20:00 UTC on 2017/02/17. Experts notified.
    • Can not access to LFC. Under investigation (2017/02/23)
    • Fixed by installing libtool-ltdl (2017/02/24).
  • As yesterday, large fraction of jobs with error status. Reported to comp-dc-operations@belle2.org at ~00:00 JST. (link to logbook)
  • Large fraction of jobs finished with error status. Have not notified experts since small absolute number.
  • Health checker info. : "Short pilot jobs" has been found since 14:20:00 UTC on 2017/02/17.(details)- Notified experts
  • Health checker info. : "Short pilot jobs" has been found at 22:20:00 UTC on 2017/02/16.(details)
  • Health checker info. : "Short pilot jobs" has been found since 05:20:00 UTC on 2016/12/21.
  • Health checker info. : "Aborted pilot jobs" has been found at 06:20:00 UTC on 2017/01/11.(details)
  • Health checker info. : "Short pilot jobs" has been found since 14:20:00 UTC on 2017/02/17.(details)
  • Health checker info. : "Short pilot jobs" has been found since 14:20:00 UTC on 2017/02/17.(details)

DIRAC.SSU.kr

  • 'Failed to install DIRAC' error reported to the site   BIIDCO-118 - JIRA project doesn't exist or you don't have permission to view it.
  • Health checker info. : "Failed to install DIRAC on " has been found since 00:20:00 UTC on 2017/02/20.
  • Health checker info. : "Failed to install DIRAC on " has been found since 18:20:00 UTC on 2017/02/19. Notified comp-dc-operations@belle2.org.
  • Health checker info. : "Failed to install DIRAC on " has been found since 05:20:00 UTC on 2017/02/18
  • Health checker info. : "Failed to install DIRAC on " has been found since 23:20:00 UTC on 2017/02/17
  • Health checker info. : "Failed to install DIRAC on " has been found at 06:20:00 UTC on 2017/02/17.
  • Health checker info. : "Failed to install DIRAC on " has been found at 00:20:00 UTC on 2017/02/17 emailed comp-dc-operations@belle2.org about multiple issues in past 2 days
  • Health checker info. : "Failed to install DIRAC on " has been found since 19:20:00 UTC on 2017/02/16
  • Health checker info. : "Failed to install DIRAC on " has been found since 03:20:00 UTC on 2017/02/16.
  • Health checker info. : "Failed to install DIRAC on " has been found since 21:20:00 UTC on 2017/02/15. Note: issue lasted ~4.5 hours

DIRAC.TIFR.in

  • Health checker info. : "Short pilot jobs" has been found since 21:20:00 UTC on 2016/12/26.
  • Health checker info. : "Short pilot jobs" has been found since 20:20:00 UTC on 2016/12/25.
  •  application Finished with Errors 08:12:00 UTC on 2016/12/25

 

  • Health checker info. : "Short pilot jobs" has been found since 04:20:00 UTC on 2016/12/25.
  • Health checker info. : "Short pilot jobs" has been found since 19:20:00 UTC on 2016/12/20.
  • Health checker info. : "Short pilot jobs" has been found since 02:20:00 UTC on 2016/12/17
  • CVMFS issue has been being fixed but not yet completed. Since partly fixed, now start to accept DIRAC jobs and most of them get failed.(17/Dec/2016)
  • CVMFS does not work normally. Already, the request to reload CVMFS has been sent. (01/Nov/2016)
  • Site maintainer will fix this CVMFS issue on 07/Nov/2016. (04/Nov/2016)

DIRAC.TMU.jp

  • Banned.
  • Pilot submission failure observed. pilot_submission_DIRAC.CINVESTAV.mx_log_1day.png.
  • Pilot submission failure is observed. Notified to site admin on 2016/12/15 2:30 (UTC) BIIDCD-340 - JIRA project doesn't exist or you don't have permission to view it. .
  • Health checker info. : "Short pilot jobs" has been found since 13:20:00 UTC on 2016/11/20.(details)  
  • Due to the wrong firewall settings, DIRAC pilot cannot be executed normally. Already, the request to change the settings of the firewall has been sent. (01/Nov/2016)

DIRAC.Tokyo.jp

  •  Health checker info. : "Short pilot jobs" has been found at 04:20:00 UTC on 2016/12/21.

DIRAC.UAS.mx

  • Health checker info. : "Short pilot jobs" has been found at 23:20:00 UTC on 2017/01/16.(details)
  • No jobs since all pilots cannot be tracked by command error on the site. BIIDCO-80 - JIRA project doesn't exist or you don't have permission to view it.
  • Health checker info. :
  1. "Belle II software could not be installed on " has been found since 22:20:00 UTC on 2016/11/23.
  2. "Not enough disk space on " has been found since 22:20:00 UTC on 2016/11/23.
  • Health checker info. : "Short pilot jobs" has been found at 21:20:00 UTC on 2016/11/23.(details)
  • Health checker info. : "Belle II software could not be installed on " has been found at 00:20:00 UTC on 2016/11/01

DIRAC.UVic.ca

  • Some file transfer failures from charon01.westgrid.ca but efficiency is >94% at 00:00 to 05:00 UTC on 2017/02/22 Not notifying experts

     
  • Health checker info. : "Short pilot jobs" has been found since 11:20:00 UTC on 2016/11/01.

DIRAC.Yamagata.jp

  •  IP address of the gate way will change at Dec 16th. Start job draining (2016/12/12).

DIRAC.Yonsei.kr

  • Health checker info. : "Aborted pilot jobs" has been found at 08:20:00 UTC on 2016/12/28
  • Health checker info. : "Aborted pilot jobs" has been found at 06:20:00 UTC on 2017/01/10.(details)

LCG.CESNET.cz

    • Health checker info. : "Failed pilot jobs" has been found at 21:20:00 UTC on 2017/02/04.(details)
    • GGUS ticket : "CESNET CE: Failed jobs with no clue"(125252) has been submited at 03:13:28 UTC on 2016/12/02.
    • Health checker info. : "Failed pilot jobs" has been found since 03:20:00 UTC on 2016/12/02.(details)
    • Health checker info. : "Failed pilot jobs" has been found since 21:20:00 UTC on 2016/11/30.(details)
    • GGUS Ticket #125252 "CESNET CE: Failed jobs with no clue" https://ggus.eu/?mode=ticket_info&ticket_id=125252  

    • Health checker info. : "Failed pilot jobs" has been found since 16:20:00 UTC on 2016/11/28.(details)
    • Health checker info. : "Failed pilot jobs" has been found at 07:20:00 UTC on 2016/11/28.(details)
    • Health checker info. : "Failed pilot jobs" has been found since 19:20:00 UTC on 2016/11/26.(details)
    • Health checker info. : "Failed pilot jobs" has been found at 16:20:00 UTC on 2016/11/26.(details)
    • Health checker info. : "Failed pilot jobs" has been found since 20:20:00 UTC on 2016/11/25.(details)
    • Health checker info. : "Failed pilot jobs" has been found at 14:20:00 UTC on 2016/11/21.(details)  
    • Health checker info. : "Short pilot jobs" has been found at 14:20:00 UTC on 2016/11/14.(details)
    •  Health checker info. : "Failed pilot jobs" has been found since 12:20:00 UTC on 2016/11/15.- Job seem stalled, however is not possible to retrieve SandBox, In the Pilot Output  stop to log at "[Pilot] Command LaunchAgent ".
      • BIIDCO-60 - JIRA project doesn't exist or you don't have permission to view it.
    • Health checker info. : "Failed pilot jobs" has been found at 22:20:00 UTC on 2017/02/23.(details)

LCG.CNAF.it

  • Health checker info. : "Short pilot jobs" has been found at 06:20:00 UTC on 2017/02/27.
  • Health checker info. : "Failed pilot jobs" has been found at 07:20:00 UTC on 2017/02/26
  • Downtime info.: ce06-lcg.cr.cnaf.infn.it is now in downtime. (GOCDB 22424
  • Health checker info. : "Short pilot jobs" has been found at 04:20:00 UTC on 2017/02/26
  • Health checker info. : "Failed pilot jobs" has been found at 07:20:00 UTC on 2017/02/24.
  • Downtime info.: ce06-lcg.cr.cnaf.infn.it is now in downtime. (GOCDB 22424)
  • Job submission check : Pilot submission failure has been found since 16:24:00 UTC on 2017/02/05. (details)
  • Health checker info. : "Failed pilot jobs" has been found at 23:20:00 UTC on 2017/02/04.(details)
  • Health checker info. : "Failed pilot jobs" has been found at 10:20:00 UTC on 2017/02/03.(details)
  • Job submission check : Pilot submission failure has been found since 16:23:00 UTC on 2017/01/21. (details)
  • Downtime info.: ce05-lcg.cr.cnaf.infn.it is now in downtime. (GOCDB 22106)

  • Job submission check : Pilot submission failure has been found since 17:23:00 UTC on 2016/12/07. (details)

  • From OperationStatus:

  • Downtime info.: ce05-lcg.cr.cnaf.infn.it is now in downtime. (GOCDB 22050)
  • Job submission check : Pilot submission failure has been found since 00:27:00 UTC on 2016/12/03
  • Job submission check : Pilot submission failure has been found since 17:24:00 UTC on 2016/12/01. (details)
  • Job submission check : Pilot submission failure has been found since 12:23:00 UTC on 2016/11/29. (details)
  • Health checker info. : "Failed pilot jobs" has been found at 21:20:00 UTC on 2016/11/26.(details)
  • Only a job submission with '/belle/Role=production' is accepted. Any tests without '/belle/Role=production' are meaningless.
  • Job submission check : Pilot submission failure has been found at 14:22:00 UTC on 2016/11/10. (details)
  •  Job submission check : Pilot submission failure has been found since 11:26:00 UTC on 2016/11/15. (details 
  •  The  ce07-lcg.cr.cnaf.infn.it CE not respond on the CREAM interface glite-ce-service-info -L 5 ce07-lcg.cr.cnaf.infn.it
    2016-11-13 14:50:01,018 FATAL - Received NULL fault; the error is due to another cause: FaultString=[connection error] - FaultCode=[SOAP-ENV:Client] - FaultSubCode=[SOAP-ENV:Client] - FaultDetail=[Connection timed out]
  • Downtime info.: ce07-lcg.cr.cnaf.infn.it is now in downtime. (GOCDB 22247)
  • Job submission check : Pilot submission failure has been found since 10:24:00 UTC on 2017/01/10. (details)


LCG.Cosenza.it

LCG.CYFRONET.pl

  • Health checker info. : "Failed pilot jobs" has been found at 08:20:00 UTC on 2017/02/26.
  • Health checker info. : "Failed pilot jobs" has been found at 04:20:00 UTC on 2017/02/26
  • Health checker info. : "BLAH ERROR" has been found at 12:20:00 UTC on 2017/02/22
  • Health checker info. : "CRL has expired" has been found since 21:20:00 UTC on 2016/12/17.
  • Health checker info. : "BLAH ERROR" has been found since 04:20:00 UTC on 2017/02/20
  • Health checker info. : "BLAH ERROR" has been found at 02:20:00 UTC on 2017/02/18
  • Job submission check : Pilot submission failure has been found since 13:23:00 UTC on 2017/01/30. (details) - experts notified
  • Job submission check : Pilot submission failure has been found since 09:27:00 UTC on 2017/01/28. (details) - experts notified
  • Health checker info. : "Short pilot jobs" has been found at 16:20:00 UTC on 2017/01/27.(details)
  • Job submission check : Pilot submission failure has been found at 08:22:00 UTC on 2017/01/27. (details)
    All failures concentrated on cream02.grid.cyf-kr.edu.pl
  • Job submission check : Pilot submission failure has been found since 13:22:00 UTC on 2017/01/26. (details) - Remind to experts 2017/01/27 10:00 (JST)
  • Job submission check : Pilot submission failure has been found at 06:22:00 UTC on 2017/01/26. (details)
  • Job submission check : Pilot submission failure has been found at 14:24:00 UTC on 2017/01/25. (details)
  • GGUS ticket : "CYFRONET: configure their services to recognise the additional VOMS servers for VO=belle"(125549) has been submited at 13:14:31 UTC on 2016/12/15.
  • Health checker info. : "Failed pilot jobs" has been found at 06:20:00 UTC on 2016/12/21. 
  • Health checker info. : "BLAH ERROR" has been found since 17:20:00 UTC on 2016/10/26.(details) (Added 2016-11-03 22:30:00 UTC)
  • Pilot submission failure is due to the disk full. notified to the site admin (ggus)
    • Fixed now
  • Health checker info. : "Failed pilot jobs" has been found at 22:20:00 UTC on 2017/02/23.(details)

LCG.DESY.de

  • Some fraction of jobs stalled. Noted on 2017-02-25 at ~00:40 JST. (direct link to plot, link to logbook entry)
  • Health checker info. : "Belle II software could not be installed on grid-wn0102.desy.de" has been found at 07:20:00 UTC on 2017/02/09.
  • Health checker info. : "BLAH ERROR" has been found since 17:20:00 UTC on 2016/11/29.(details)
  • BIIDCO-70 - JIRA project doesn't exist or you don't have permission to view it.
  • Health checker info. : "Failed pilot jobs" has been found since 14:20:00 UTC on 2016/11/26.(details)
  • Health checker info. : "Failed pilot jobs" has been found since 18:20:00 UTC on 2016/11/24.(details)
  • Health checker info. : "Failed pilot jobs" has been found since 02:20:00 UTC on 2016/11/21.(details)

LCG.Frascati.it

  •  Health checker info. : "Short pilot jobs" has been found since 04:20:00 UTC on 2016/12/03.(details)
  • Health checker info. : "Failed pilot jobs" has been found since 06:20:00 UTC on 2016/12/02.(details)
  • Health checker info. : "Short pilot jobs" has been found at 22:20:00 UTC on 2016/11/23.(details)
  •  Health checker info. : "Short pilot jobs" has been found since 03:20:00 UTC on 2016/11/28.(details)
  •  Health checker info. : "Short pilot jobs" has been found since 05:20:00 UTC on 2016/11/29.(details)
  • Health checker info. : "Failed pilot jobs" has been found at 07:20:00 UTC on 2016/12/01.(details)

LCG.HEPHY.at

  • Health checker info. : "Failed pilot jobs" has been found at 22:20:00 UTC on 2017/02/05.(details)
  • Health checker info. : "Failed pilot jobs" has been found at 10:20:00 UTC on 2017/02/03.(details)
  • Health checker info. : "Failed pilot jobs" has been found at 21:20:00 UTC on 2017/01/27.(details)
  • Health checker info. : "Failed pilot jobs" has been found at 06:20:00 UTC on 2016/12/21.
  • Health checker info. : "Failed pilot jobs" has been found at 08:20:00 UTC on 2017/02/07
     
  • Job submission check : Pilot submission failure has been found at 00:22:00 UTC on 2016/11/01.  

LCG.KEK.jp

  • Job submission check : Pilot submission failure has been found at 20:24:00 UTC on 2017/01/27. (details)
  • Health checker info. : "Short pilot jobs" has been found at 20:20:00 UTC on 2016/12/20.
  • The number of jobs being increased.   BIIDCO-40 - JIRA project doesn't exist or you don't have permission to view it.

LCG.KEK2.jp

  • Job submission check : Pilot submission failure has been found since 18:25:00 UTC on 2017/02/17. (details)- notified comp-dc-operations@belle2.org
  • Job submission check : Pilot submission failure has been found at 14:23:00 UTC on 2017/02/08. 
  • Job submission check : Pilot submission failure has been found at 16:27:00 UTC on 2017/01/13.
  • Health checker info. : "Short pilot jobs" has been found at 06:20:00 UTC on 2016/12/21.
  • Health checker info. : "Short pilot jobs" has been found since 20:20:00 UTC on 2016/12/20.
  • Health checker info. : "Short pilot jobs" has been found at 20:20:00 UTC on 2016/12/20.
  • Health checker info. : "Short pilot jobs" has been found at 17:20:00 UTC on 2016/12/19.
  • The number of jobs being increased after the restriction set for job failures.
    • BIIDCO-39 - JIRA project doesn't exist or you don't have permission to view it. LCG.KEK2.jp - Application Finished With Errors
  • Health checker info. : "Failed pilot jobs" has been found since 11:20:00 UTC on 2017/01/09.(details)

LCG.KISTI.kr

  • Health checker info. : "Short pilot jobs" has been found at 06:20:00 UTC on 2016/12/21.
  • Health checker info. : "Short pilot jobs" has been found at 22:20:00 UTC on 2016/12/20.
  • Health checker info. : "Failed pilot jobs" has been found since 13:20:00 UTC on 2016/12/02.(details)
  • Health checker info. : "Failed pilot jobs" has been found since 12:20:00 UTC on 2016/11/30.(details)
  • We expect to solve this issue: Health checker info. : "Failed to install DIRAC on N/A" has been found since 13:20:00 UTC on 2016/11/01.
  • KISTI Belle System will be unavailable from Nov. 23(9am. KST) to 23(1pm. KST) due to KISTI-GSDC Belle II system management (kernel update & host_cert renewal). Every belle queue to be closed on Nov. 22(5pm. KST)

LCG.KIT.de

LCG.KMI.jp

  • Health checker info. : "Belle II software could not be installed on pwn24.local,own07.local,pwn10.local" has been found since 05:20:00 UTC on 2017/02/09.
  • Health checker info. : "Short pilot jobs" has been found at 04:20:00 UTC on 2016/12/21.
  • Health checker info. : "Short pilot jobs" has been found since 09:20:00 UTC on 2016/12/05.(details)
  • Health checker info. : "Failed pilot jobs" has been found at 21:20:00 UTC on 2016/11/23.(details)
    • BIIDCO-74 - JIRA project doesn't exist or you don't have permission to view it.

LCG.Legnaro.it

  •  

LCG.McGill.ca

 

  • Growing fraction of "Application finished with errors" observed on 2017-02-21  (link to plot, logbook posts 1, 2) – notified comp-dc-operations@belle2.org at ~2:00 JST
  • Health checker info. : "Failed pilot jobs" has been found at 18:20:00 UTC on 2016/12/20 

LCG.Melbourne.au

  • Health checker info. : "Short pilot jobs" has been found at 22:20:00 UTC on 2016/12/17.(details)
  • Health checker info. : "Failed pilot jobs" has been found at 15:20:00 UTC on 2016/12/01.(details)
  •  Job submission check : Pilot submission failure has been found since 13:22:00 UTC on 2016/11/10. (details)

LCG.Napoli.it

  • Health checker info. : "Failed pilot jobs" has been found at 22:20:00 UTC on 2017/02/26
  • Pilot job submission to Napoli short queues is now restricted with MaxTotalJobs = 0

    • grisuce.scope.unina.it: cream-pbs-grisu_short
    • ce.scope.unina.it: cream-pbs-egee_short

    Apparently there are jobs running longer than 1 day (1440 min).

  • Health checker info. : "Failed pilot jobs" has been found at 10:20:00 UTC on 2017/02/26.
  • Health checker info. : "Failed pilot jobs" has been found at 07:20:00 UTC on 2017/02/26.
  • Health checker info. : "Failed pilot jobs" has been found at 04:20:00 UTC on 2017/02/26
  • Health checker info. : "Failed pilot jobs" has been found since 23:20:00 UTC on 2017/02/25
  • Health checker info. : "Failed pilot jobs" has been found at 22:20:00 UTC on 2017/02/25.(details)- notified comp-dc-operations@belle2.org 2017-02-26 at ~0800 JST (25th 2300 UTC)
  • Health checker info. : "Failed pilot jobs" has been found at 20:20:00 UTC on 2017/02/25.(details)
  • Health checker info. : "Failed pilot jobs" has been found since 17:20:00 UTC on 2017/02/25.(details)
  • Health checker info. : "Failed pilot jobs" has been found since 05:20:00 UTC on 2017/02/24.(details)
  • Health checker info. : "Failed pilot jobs" has been found since 06:20:00 UTC on 2017/02/23 (details)
  • Health checker info. : "Failed pilot jobs" has been found at 12:20:00 UTC on 2017/02/22
  • Health checker info. : "Failed pilot jobs" has been found since 17:20:00 UTC on 2017/02/21.(details) - notified comp-dc-operations@belle2.org 2017-02-21 at ~4:30JST
  • Health checker info. : "Failed pilot jobs" has been found at 14:20:00 UTC on 2017/02/21.(details)
  • "Failed pilot jobs" has been found at 10:20:00 UTC on 2017/02/21. Not notified experts, since small number.
  • Health checker info. : "Failed pilot jobs" has been found since 13:20:00 UTC on 2017/02/19.
  • Health checker info. : "Failed pilot jobs" has been found at 05:20:00 UTC on 2017/02/06.(details)
  • Health checker info. : "Failed pilot jobs" has been found since 13:20:00 UTC on 2017/02/05.(details)
  • Health checker info. : "Failed pilot jobs" has been found at 05:20:00 UTC on 2017/02/05.(details)
  • Health checker info. : "Failed pilot jobs" has been found since 23:20:00 UTC on 2017/02/04.(details)
  • Health checker info. : "Failed pilot jobs" has been found since 19:20:00 UTC on 2017/02/04.(details)
  • Health checker info. : "Failed pilot jobs" has been found at 06:20:00 UTC on 2017/02/04.(details)
  • Job submission check : Pilot submission failure has been found since 12:22:00 UTC on 2017/01/27. (details)
  • Health checker info. : "BLAH ERROR" has been found at 18:20:00 UTC on 2017/01/26.(details)
  • Job submission check : Pilot submission failure has been found at 18:22:00 UTC on 2017/01/26. (details)
  • Downtime info.: ce.scope.unina.it is now in downtime. (GOCDB 22369)
    • NGI_IT - UNINA-EGEE

    • Start of downtime [UTC]: 25/01/2017 14:30 UTC

    • End downtime      [UTC]: 27/01/2017 19:00 UTC

  • Job submission check : Pilot submission failure has been found since 07:24:00 UTC on 2017/01/25. (details)
  • Health checker info. : "Not enough disk space on wn173.scope.unina.it,wn179.scope.unina.it" has been found at 11:20:00 UTC on 2017/01/23.
  • Health checker info. : "Failed pilot jobs" has been found since 00:20:00 UTC on 2017/01/22.(details)
  • Health checker info. : "Failed pilot jobs" has been found since 22:20:00 UTC on 2017/01/17.(details)
  • Health checker info. : "Failed pilot jobs" has been found since 19:20:00 UTC on 2017/01/14.(details)
  • Health checker info. : "Failed pilot jobs" has been found at 10:20:00 UTC on 2017/01/10.(details)
  • Health checker info. : "Failed pilot jobs" has been found since 05:20:00 UTC on 2017/01/11.(details)
  • Health checker info. : "Failed pilot jobs" has been found since 21:20:00 UTC on 2017/02/23.(details)
  • Health checker info. :"Failed pilot jobs" has been found since 21:20:00 UTC on 2017/02/22.

LCG.NTU.tw

  • GGUS ticket : "Job aborted with BLAH error in belle2grid2.cc.ntu.edu.tw at TW-NTU-HEP"(126812) has been submited at 19:22 UTC on 2017/02/24.
  • Health checker info. : "BLAH ERROR" has been found since 14:20:00 UTC on 2017/02/24
  • Job submission check : Pilot submission failure has been found since 19:26:00 UTC on 2017/02/23. (details) - experts notified (9.40 UTC, 2017/02/24)
  • GGUS ticket : "Job aborted with BLAH error in belle2grid2.cc.ntu.edu.tw at TW-NTU-HEP"(126691) has been submited at 02:28:10 UTC on 2017/02/20. – notified experts ~2017-02-20 0000JST
  • GGUS ticket : "Job aborted with BLAH error in belle2grid2.cc.ntu.edu.tw at TW-NTU-HEP"(126691) has been submited at 07:34:38 UTC on 2017/02/18.
  • GGUS ticket: "Job aborted with BLAH error in belle2grid2.cc.ntu.edu.tw at TW-NTU-HEP" (126691) has been submitted at 07:20 UTC on 2017/02/18.

  • Health checker info. : "BLAH ERROR" has been found since 15:20:00 UTC on 2017/02/17.(details)- notified experts
  • Health checker info. : "BLAH ERROR" has been found since 16:20:00 UTC on 2017/02/04.(details) - notified experts
  • Health checker info. : "BLAH ERROR" has been found at 22:20:00 UTC on 2017/01/31.(details)
  • Health checker info. : "BLAH ERROR" has been found at 21:20:00 UTC on 2017/01/27.(details)
  • Health checker info. : "Aborted pilot jobs" has been found at 03:20:00 UTC on 2017/01/25.(details) - experts notified
  • Health checker info. : "BLAH ERROR" has been found at 00:20:00 UTC on 2017/01/25.(details) - experts notified
  • Health checker info. : "BLAH ERROR" has been found at 22:20:00 UTC on 2017/01/24.(details)- experts notitified
  • Job submission check : Pilot submission failure has been found since 05:23:00 UTC on 2017/01/22. (details)
  • Health checker info. : "BLAH ERROR" has been found since 13:20:00 UTC on 2017/01/11.(details)
  • GGUS ticket : "[TW-NTU-HEP] Job aborted with BLAH error"(125175) has been submited at 02:57:16 UTC on 2016/11/25.
  • Health checker info. : "Failed pilot jobs" has been found since 21:20:00 UTC on 2016/12/17.(details)
  • BLAH error observed. GGUS ticket submitted at 11/25 8:40. https://www.ggus.org/?mode=ticket_info&ticket_id=125175
  • Job submission check : Pilot submission failure has been found since 12:01:00 UTC on 2016/09/16. (details)
  • Health checker info. : "CRL has expired" has been found since 21:20:00 UTC on 2016/12/17.
  • Job submission check : Pilot submission failure has been found since 19:26:00 UTC on 2017/02/23. (details)-noticed to comp-dc-operations@belle2.org.
  • Health checker info:"BLAH ERROR" has been found since 15:20:00 UTC on 2017/02/17.

LCG.Pisa.it

 

  • Health checker info. : "Short pilot jobs" has been found at 07:20:00 UTC on 2017/02/26.(details)
  • Health checker info. : "Failed pilot jobs" has been found at 22:20:00 UTC on 2017/02/25.(details)- notified comp-dc-operations@belle2.org 2017-02-26 at ~0800 JST (25th 2300 UTC)
  • Health checker info. : "Failed pilot jobs" has been found at 23:20:00 UTC on 2017/02/24.(details)
  • Health checker info. : "Failed pilot jobs" has been found since 06:20:00 UTC on 2017/02/23.(details)
  • Health checker info. : "Failed pilot jobs" has been found since 15:20:00 UTC on 2017/02/21.(details) - notified comp-dc-operations@belle2.org 2017-02-21 at ~4:30JST
  • Health checker info. : "Failed pilot jobs" has been found at 14:20:00 UTC on 2017/02/21.(details)
  • Health checker info. : "Failed pilot jobs" has been found since 13:20:00 UTC on 2017/02/20.(details) – notified experts ~2017-02-20 0000JST
  • Job submission check : Pilot submission failure has been found at 14:28:00 UTC on 2017/02/20. (details) – notified experts ~2017-02-20 0000JST
  • GGUS ticket : "Jobs submitted to gridce0.pi.infn.it are finished immediately"(122842) has been submited at 08:12:14 UTC on 2016/07/13.
  • Health checker info. : "Short pilot jobs" has been found since 18:20:00 UTC on 2017/02/15.(details)
  • Health checker info. : "Short pilot jobs" has been found since 14:20:00 UTC on 2017/02/06.(details)
  • GGUS ticket : "Jobs submitted to gridce0.pi.infn.it are finished immediately"(122842) has beensubmited at 08:12:14 UTC on 2016/07/13.
  • Health checker info. : "Short pilot jobs" has been found at 10:20:00 UTC on 2017/02/04.(details)

 

  • Health checker info. : "Short pilot jobs" has been found since 14:20:00 UTC on 2017/02/06.(details)

  • Downtime info.: all CEs are in downtime 2017/01/25 00:00 UTC-2017/01/27 04:00. (GOCDB 22372)
  • Health checker info. : "Short pilot jobs" has been found since 02:20:00 UTC on 2017/02/03.(details)
  • Health checker info. : "Short pilot jobs" has been found at 16:20:00 UTC on 2017/01/25.(details)
  • GGUS ticket : "Jobs submitted to gridce0.pi.infn.it are finished immediately"(122842) has beensubmited at 08:12:14 UTC on 2016/07/13.
  • Health checker info. : "Failed pilot jobs" has been found at 23:20:00 UTC on 2017/01/17.(details)
  • Health checker info. : "Short pilot jobs" has been found since 08:20:00 UTC on 2016/12/28.(details)
  • Health checker info. : "Short pilot jobs" has been found since 01:20:00 UTC on 2016/12/20.(details)
  • Health checker info. : "Short pilot jobs" has been found since 12:20:00 UTC on 2016/12/06.(details)
  • Health checker info. : "Short pilot jobs" has been found at 06:20:00 UTC on 2016/12/04.(details)
    • GGUS ticket : "Jobs submitted to gridce0.pi.infn.it are finished immediately"(122842) has beensubmited at 08:12:14 UTC on 2016/07/13.
    • reported to comp-dc-operations@belle2.org 8:08 UTC on 2016/12/02
    • Health checker info. : "Short pilot jobs" has been found since 09:20:00 UTC on 2016/11/04 (Reported 2016-11-04 15:45 UTC)
    • Health checker info. : "Short pilot jobs" has been found at 13:20:00 UTC on 2016/11/15.(details 
  • Job submission check : Pilot submission failure has been found at 07:24:00 UTC on 2016/12/02. (details)
  • Health checker info. : "Failed pilot jobs" has been found since 02:20:00 UTC on 2016/12/02.(details)
    • Health checker info. : "Failed pilot jobs" has been found since 12:20:00 UTC on 2016/11/21.(details)
    • BIIDCO-75 - JIRA project doesn't exist or you don't have permission to view it.

LCG.Roma3.it

  • Health checker info. : "Failed pilot jobs" has been found at 10:20:00 UTC on 2017/02/22
  • Some jobs finishing with errors at 00:00-07:00 on 2017/21/02
  • BIIDCO-111 - JIRA project doesn't exist or you don't have permission to view it.

LCG.Torino.it

  • GGUS ticket : "Job submission failed on t2-ce-01.to.infn.it"(124927) has been submited at 16:39:39 UTC on 2016/12/16.
  • Health checker info. : "Failed pilot jobs" has been found at 22:20:00 UTC on 2017/02/19.
  • Job submission check : Pilot submission failure has been found since 01:25:00 UTC on 2017/02/06. (details) - reported to comp-dc-operations@belle2.org
  • Health checker info. : "Failed pilot jobs" has been found at 22:20:00 UTC on 2017/02/05.(details)
  • GGUS ticket : "Job submission failed on t2-ce-01.to.infn.it"(124927) has been submited at 16:39:39 UTC on 2016/12/16.
  • Job submission check : Pilot submission failure has been found at 04:27:00 UTC on 2017/02/05. (details)
  • Job submission check : Pilot submission failure has been found at 11:24:00 UTC on 2017/02/03. (details)
  • Job submission check : Pilot submission failure has been found since 21:26:00 UTC on 2017/01/31. (details)
  • Job submission check : Pilot submission failure has been found since 09:22:00 UTC on 2017/01/27. (details)
  • Job submission check : Pilot submission failure has been found since 13:24:00 UTC on 2017/01/25. (details)
  • GGUS ticket : "Job submission failed on t2-ce-01.to.infn.it"(124927) has been submited at 16:39:39 UTC on 2016/12/16.
  • Job submission check : Pilot submission failure has been found at 06:25:00 UTC on 2017/01/22. (details)
  • Job submission check : Pilot submission failure has been found since 16:23:00 UTC on 2017/01/21. (details)
  • Job submission check : Pilot submission failure has been found since 04:24:00 UTC on 2017/01/14. (details)
  • GGUS ticket : "Job submission failed on t2-ce-01.to.infn.it"(124927) has been submited at 16:39:39 UTC on 2016/12/16.
  • Health checker info. : "Short pilot jobs" has been found at 04:20:00 UTC on 2016/12/21.
  • Job submission check : Pilot submission failure has been found at 12:22:00 UTC on 2016/11/11. (details)

LCG.ULAKBIM.tr

OSG.UMiss.us

  •  Application Finished with Errors 09:00:00 UTC on 2017/02/24.

SSH.KMI.jp



Links

 


 

 

  • Set INTERWIKIPLUGIN_RULESTOPIC = InterWikis

Set EDITMETHOD = ra

  • No labels

4 Comments


Write a comment…