Contents

  •  Click here to expand...

 

Production Plans

  • MC7
    • Started on November 1, 2016
      • Phase III signal MC samples with 12th background MC campaign samples
      • Phase III Y(6S) production with 12th background MC campaign samples
      • Phase II Y(6S) production with new, phase II background samples
      • Phase II Y(4S) production with new, phase II background samples
      • Phase III Y(3S) production with 12th background MC campaign samples
      • Additional generic samples at phase III with and without backgrounds
    • MC7 phase 2 started January 9, 2017
      • Phase III Y(4S) generic samples
      • Phase III Y(4S) ccbar sample with new parameters
    • MC7 ended February 8, 2017
  • MC8
    • MC8 test production started February 1, 2017
    • In parallel with MC8 production, ~10,000 analysis test jobs were submitted on Feb. 2nd. These jobs will finish within next couple of hours from now (as of 21:00 JST)
    • MC8 started on February 17, 2017
      • Phase III generic samples
      • Phase III signal samples
      • Phase II signal samples
      • Other?
    • MC8 restarted May 8
      • Phase III signal MC samples (mainly for validation use)
      • Phase III Y(4S) ccbar generic samples with new Pythia parameters
  • MC9
    • MC9 started July 5, 2017
      • Phase III generic samples with BGx0
      • Signal samples with BGx0
      • Phase II generic samples with BGx0

Production Status

MC9

Official production started at ~21:00 JST on July 5, 2017. Starting with BGx0 generic samples (0.2 ab-1)

Submitted second batch of BGx0 generic jobs (July 7, ~04:00 JST)

Third and fourth batches of BGx0 generic jobs (July 10)

Submitted a few BGx0 signal samples (July 12 ~04:30 JST)

Submitted the phase 2 generic samples with BGx0 (July 14 ~04:00 JST)

Submitted the rest of the BGx0 signal samples (July 16 ~00:00 JST)

New requests for BGx0 signal samples submitted (July 19 ~01:30 JST)

MC9 restarted with BGx1 phase 2 samples - 50 fb-1 generic and signal samples (July 30 ~10:00 JST)

Submitted first batch of phase 3 samples with background - mixed and charged BBbar - about 140k jobs (August 12 ~08:00 JST)

Added uubar: ~180k jobs (August 13 ~10:30 JST)


MC8

Official production started at 01:30 JST on Feb 17 with a 1 ab-1 sample of phase 3 - Y(4S) generic MC

Submitted 72 signal samples for a total of about 162 million events (Feb 25, ~02:00 JST)

Submitted additional signal samples equivalent to 820 million events (Mar 1, ~00:00 JST)

Submitted first part of 0.5 ab-1 Y(5S) generic sample: bsbs only = 29.24 million events (Mar 1, ~02:00 JST)

Submitted the rest of 0.5 ab-1 Y(5S) generic sample (Mar 1, ~22:30 JST)

Submitted the rest of the phase 3 Y(4S) signal samples (Mar 8)

Submitted the phase 2 signal samples (Mar 8)

Submitted a 200 fb-1 Y(4S) generic sample with a total of just over 1 billion events (Mar 11, ~01:30 JST)

Submitted phase 3 bottomonium samples (Mar 14) - resubmitted some problematic productions, Miyake-san is fixing others (Mar 15)

Submitted phase 2 bottomonium samples (Mar 15)

Submitted 300 fb-1 phase 3 Y(3S) generic sample and several small signal samples (Mar 18, ~02:00 JST)

Submitted a large generator-level skim sample (96 x 10^9 generated events). These should finish very quickly, but will make a big spike in the production progress plot (Mar 26, ~02:00 JST)

One last signal sample (7400 jobs) was still in the pipeline (requested before the last DP update). Submitted on Mar 28 at ~22:00 JST.

Submitted 16 signal samples (relatively small) and 1 ab-1 ccbar generic sample (May 8, ~23:00 JST)


MC7

Official production started as scheduled at 00:00 JST on Nov. 1.

A full list of samples is given on the data production page

A total of 100,000 jobs of 10k events each have been submitted as of Nov. 4

The following productions have been stopped and the corresponding jobs killed due to improper ROOT output (Nov. 10):

  • 517 - nonbsbs phase 2 Y(6S)
  • 518 - bsbs phase 2 Y(6S)

A total of 139,972 jobs of 10k events have been submitted as of Nov. 10

Phase 2 background samples have been distributed. 

  • Distribution to the Grid sites: BIIDCO-31 - Getting issue details... STATUS
  • Distribution to the non-Grid sites: BIIDCO-32 - Getting issue details... STATUS
  • When they are ready, we will finish production of phase 2 samples. This will complete the official requests for MC7. Thereafter, we will submit additional generic samples and perhaps include some additional requests from the physics group.

Generic phase 2 samples at Y(6S) and Y(4S) as well as some signal samples with backgrounds have been submitted (~23:50 JST on Nov. 11 and ~04:50 JST on Nov. 12). In total 17,776 Y(6S) + 29,608 Y(4S) = 47,384 jobs were submitted.

All requested MC samples for MC7 have been submitted (Nov 14)

Also submitted Phase III Y(3S) requests, which total 23,850 jobs of 10k events (Nov 14 at 04:30 JST). ((Running total is now 215,332 jobs))

Completed database access tests with 2k and 5k concurrent jobs (Nov 16 at ~10:00 JST)

Submitted Phase III Y(4S) generic samples (1 ab-1), which total 575,300 jobs of 10k events (Nov 16 at ~23:40 JST)

Submitted additional jobs including another 1 ab-1 of phase 3 generic Y(4S) samples, which total 636,000 jobs of about 0.5/0.5 of 10k/5k events. This is roughly equal to the number of jobs previously submitted, bringing the total number to over 1.2 million jobs. (December 8 at ~00:00 JST)

Submitted additional generic samples (1 ab-1 of mixed and charged events), totalling 160,400 additional jobs. (December 12 at ~04:00 JST)

Submitted additional signal samples (12.7k jobs) and a new ccbar sample with modified parameters (159.4k jobs). (December 21 at ~02:00 JST)

The system keeps working with the jobs already submitted until they are exhausted. No new submission of additional samples until January 9, after the downtime of KEK

 

New submission started January 9 at 0:00 JST with phase 3 - Y(4S) generic samples:

  • mixed (BGx1: 42770 jobs, BGx0: 21380) ((~0:00 JST))
  • charged (BGx1: 45230 jobs, BGx0: 22620) ((~6:00 JST))
  • uubar, ddbar, ssbar, ccbar, taupair: 538,060 jobs ((~01:30 JST))

Submitted new ccbar samples with new parameters: 159,480 jobs (~22:15 JST on Jan 27)

Test production for MC8, including 10 jobs of 10k events each with BG for generic samples (70 jobs total) started Feb. 1.

Submitted additional MC7 signal samples: 11,200 jobs (~02:00 JST on Feb 5)


Central Services

Dirac


High CPU load issue happened again. Informed to KEKCC expert (2017-08-08 20:00 UTC)  BIIDCO-240 - Getting issue details... STATUS

DIRAC system will be off from Aug 3 4:00 UTC to Aug 8 3:00 UTC due to KEKCC summer shutdown

Same issue happened again, as BIIDCO-176. Informed to KEKCC expert (2017-07-31 10:30 UTC)

  • This issue had been addressed by KEKCC expert. All DIRAC and DB nodes were restarted and now services are back to normal status. (14:05 UTC)

Sudden increase CPU load for all DIRAC servers (2017-07-09 13:00 UTC)

  • JIRA ticket is issued  BIIDCO-176 - Getting issue details... STATUS
  • Load in all DIRAC servers, DB servers and web servers is > 400% . The number of jobs running in all sites is too low (447).  2017-07-09 18:00 UTC

CPS load on b2dchsv01.cc.kek.jp is too high (2017-07-07 11:15 am UTC)

  • Issued a JIRA ticket:  BIIDCO-169 - Getting issue details... STATUS
  • Informed to KEKCC expert. However, the node might be left during weekend due to lack of support. Let's keep an eye on the node activity. 2017-07-07 12:30 UTC
  • Issue fixed by KEKCC expert. 2017-07-07 17:30 UTC.

CPU load of b2dcsv ... is too high 2017/06/12 9:19 UT

  • Memory consumption increase and fluctuating at one server (b2dchsv05.cc.kek.jp) 2017/03/15 20:30 JST.
    → Experts are still under investigating 2017/03/17 00:14 JST
    → Reset has performed every one hour for b2dchsv05.cc.kek.jp in order to avoid reaching memory limit 2017/03/17 02:27 JST
  • The memory issue above seems gone. Still the root cause is not identified, though. 2017-03-24 09:28 UTC

Memory load rise of b2dcsdb1.cc.kek.jp and b2dcsdb2.cc.kek.jp DB Production servers . 2017-07-14 4:30 JST

  • Issued JIRA ticket BIIDCO-195 - Getting issue details... STATUS

DB Production

memory increase on all "DB Production" servers (b2dchdb1.cc.kek.jpb2dchdb2.cc.kek.jpb2dcsdb1.cc.kek.jpb2dcsdb2.cc.kek.jp2017-07-31 23:00 UTC

  • issued JIRA ticket  BIIDCO-231 - Getting issue details... STATUS

DDM

Monitor


File Transfers and Replication Status

See also DDM for related issues

FTS

  • BIIDCO-269 - Getting issue details... STATUS
  • Some "critical" errors observed in FTS log on July 6-7. ggus:129750  

Replication Status


SEs

SE Common Issues

  • The number of "waiting" jobs have been increased for all the sites. 2017-07-09 18:40 UTC.
    • Tracking the issue in JIRA ticket  BIIDCO-174 - Getting issue details... STATUS .

Destination SE: CESNET-TMP-SE (dpm1.egee.cesnet.cz)

  • SE Health check by DDM : checksum, remove file, remove directory, download, upload, ls do not work since 2017-08-17 17:10:08 UTC.
    • Issued a ticket:  BIIDCO-273 - Getting issue details... STATUS

  • SE Health check by DDM : checksum, remove file, remove directory, download, upload, ls do not work since 2017-08-09 17:10:21 UTC.
    • issued a ticket:  BIIDCO-245 - Getting issue details... STATUS


Destination SE: CNAF-TMP-SE (storm-fe-archive.cr.cnaf.infn.it)

  • Not enough free space BIIDCO-137 - Getting issue details... STATUS

Destination SE: DESY-TMP-SE (dcache-se-desy.desy.de)

  • SE Health check by DDM : remove file, download, upload do not work since 2017-08-16 23:18:49 UTC.
  • The que of replications seems stuck BIIDCO-208 - Getting issue details... STATUS
  • Not enough free space BIIDCO-107 - Getting issue details... STATUS
  • Transfer efficiency is in between 0-20% BIIDCO-170 - Getting issue details... STATUS

Destination SE: KEK2-TMP-SE (kek2-se01.cc.kek.jp)

  • Still banned for removal due to the issue in the back-end HSM
    BIIDCO-41 - Getting issue details... STATUS

Destination SE: KISTI-TMP-SE (belle-se-head.sdfarm.kr)

  • File transfer failure from SOURCE: belle-se-head.sdfarm.kr and very low efficiency since 2017-07-18 06:00 UTC
    → GGUS ticket : https://ggus.eu/index.php?mode=ticket_info&ticket_id=129614 has submitted at 2017-07-18 13:10 UTC.
    → File transfer seems succeeded after restarting relevant services. Still communication problem between belle-se-head.sdfarm.kr and other sites 2017-07-19 15:00 JST
    • Now solved and verified
  • SE Health check by DDM : checksum, remove file, remove directory, download, upload, ls do not work since 2017-06-28 03:08:36 UTC.
  • File transfer failure appear as Destination: BIIDCO-174 - Getting issue details... STATUS BIIDCO-201 - Getting issue details... STATUS
    Solved and verified (2017-07-18): GGUS ticket https://ggus.eu/index.php?mode=ticket_info&ticket_id=129569 has submitted to site at 2017-07-16 10:25 UTC
    → Issue has solved with increasing var directory size by site admin. 2017-07-17 07:07 UTC

Destination SE: KIT-TMP-SE (gridka-dcache.fzk.de)

  • KIT SE disabled as Destination SE BIIDCO-199 - Getting issue details... STATUS
  • Not enough free space BIIDCO-134 - Getting issue details... STATUS

Destination SE: KMI-TMP-SE (nsrmfe01.hepl.phys.nagoya-u.ac.jp)

  • The que of replications seems stuck BIIDCO-208 - Getting issue details... STATUS
  • The number of done is zero while the number of queued is not zero. on 13. Jul. 2017 06:19h UTC
  • Not enough free space BIIDCO-136 - Getting issue details... STATUS


Destination SE: Napoli-TMP-SE (belle-dpm-01.na.infn.it)

  • Not enough free space  BIIDCO-146 - Getting issue details... STATUS

Destination SE: PNNL-TMP-SE (se.hep.pnnl.gov) 

  • Remove is Banned: BIIDCO-232 - Getting issue details... STATUS
  • The number of done is zero while the number of queued is not zero. on 13. Jul. 2017 06:19h UTC

Destination SE: SIGNET-TMP-SE (dcache.ijs.si)

Other SEs

CYFRONET-TMP-SE (dpm.cyf-kr.edu.pl)

HEPHY-TMP-SE (hephyse.oeaw.ac.at)

  • File transfer failure occurred from SOURCE: hephyse.oeaw.ac.at to other sites in last 4 hours
    Solved and verified (2017-07-20 23:43) : GGUS ticket https://ggus.eu/index.php?mode=ticket_info&ticket_id=129602 has submitted at 2017-07-18 14:00 JST
    → Mysql problem with disk server was solved at 2017-07-19 18:30 UTC

McGill SE  (storm02.clumeq.mcgill.ca)

  • File transfer failure from SOURCE: storm02.clumeq.mcgill.ca to other site BIIDCO-202 - Getting issue details... STATUS
    ERROR: SOURCE Error reported from srm_ifce : 13 [SE][Ls][SRM_AUTHORIZATION_FAILURE] No approachable VFS found for user! 

NTU-TMP-SE (bgrid3.phys.ntu.edu.tw)

Pisa-TMP-SE (stormfe1.pi.infn.it)

Torino-TMP-SE (se-srm-00.to.infn.it)

ULAKBIM-TMP-SE (torik1.ulakbim.gov.tr)

UVic-TMP-SE(charon01.westgrid.ca)

  • A large number of file transfer failure is observed. 
    GGUS ticket is submitted (130179)

Sites

Sites Common Issues

BIIDCO-256 - Getting issue details... STATUS

BIIDCO-257 - Getting issue details... STATUS

Conditions database appears to be down so jobs may fail until it's back up 2017-08-16 10:14:56 +0200

ARC.DESY.de

  • Health checker info. : "Short pilot jobs" has been found since 13:20:00 UTC on 2017/08/17.
  • Health checker info. : "Short pilot jobs" has been found since 02:20:00 UTC on 2017/08/09.
    • ARC.DESY.de : "Short pilot jobs"  BIIDCO-207 - Getting issue details... STATUS BIIDCO-242 - Getting issue details... STATUS
    • GGUS Ticket submitted https://ggus.eu/?mode=ticket_info&ticket_id=130011
      • Assumed problem is that the failure to get machine parameters is affecting the cycle time and therefore the calculation of the pilot job monitoring. Response is that these are optional parameters which won't be implemented. So we will need to figure out exactly what the problem is see how it can be fixed.

ARC.KIT.de

ARC.LMU.de

  • This is a test site. Do not need to report any issue.

ARC.LMU2.de

ARC.MPPMU.de

  • Job submission check : Pilot submission failure has been found at 06:31:00 UTC on 2017/05/10. (details)

ARC.SIGNET.si

  • Health checker info. : "Failed pilot jobs" has been found at 14:20:00 UTC on 2017/08/20.
  • Health checker info. : "Short pilot jobs" has been found at 14:20:00 UTC on 2017/08/16.(details)
  • Solved and verified (2017-07-06): DIRAC cannot get the necessary information from the CE and pilot submission to the site fails (ggus:129333 pikolit.ijs.si returns nothing for GlueVOViewLocalID=belle) 2017-07-04
  • Job submission check : Pilot submission failure has been found since 05:26:00 UTC on 2017/03/10. BIIDCO-126 - Getting issue details... STATUS

CLOUD.CC1_Krakow.pl

  • Seeing no jobs (no plot) is not a problem

DIRAC.Beihang.cn

  • Banned. BIIDCO-43 - Getting issue details... STATUS
    • All the upload trials are failing against all the SEs configured: OutputSE (KMI-TMP-SE, PNNL-TMP-SE), Fail-over SEs(DESY-TMP-SE, Napoli-TMP-SE, PNNL-TMP-SE, KIT-TMP-SE)
  • Large % of failed jobs in DIRAC status plot (Added 2016-11-03 22:45:00 UTC) 

DIRAC.BINP.ru

  • JIRA Ticket "Connection to belle2.inp.nsk.su from Nagoya DIRAC is failed" is submitted (2017-07-09 00:00)  BIIDCO-173 - Getting issue details... STATUS

DIRAC.CINVESTAV.mx

  • Health checker info. : "Short pilot jobs" has been found at 07:20:00 UTC on 2017/08/12.(details)
  • Job Submission failure is observed since 01:31:00 UTC on 2017/07/30.
    JIRA Ticket submitted.  BIIDCO-221 - Getting issue details... STATUS
  • BIIDCO-155 - Getting issue details... STATUS

DIRAC.DESY.de

  • Test site. Not in use in MC production

DIRAC.IITG.in

  • "Not enough disk space on " has been found since 08:20:00 UTC on 2017/08/19.
  • "Aborted pilot jobs" has been found since 08:20:00 UTC on 2017/08/19.
  • Aborted Pilot job is observed since 15:20:00 UTC on 2017/07/05.
    • JIRA Ticket submitted ( BIIDCO-166 - Getting issue details... STATUS ).
  • BIIDCO-156 - Getting issue details... STATUS

DIRAC.LMU.de

  • Not in use in MC production BIIDCO-26 - Getting issue details... STATUS
    • Banned for now.

DIRAC.MIPT.ru

  • Health checker info. : "Short pilot jobs" has been found since 03:20:00 UTC on 2017/08/17.
  • Health checker info. : "Short pilot jobs" has been found at 23:20:00 UTC on 2017/08/16.(details)
  • Health checker info. : "Short pilot jobs" has been found since 11:20:00 UTC on 2017/08/16.(details)
  • Health checker info. : "Aborted pilot jobs" has been found since 08:20:00 UTC on 2017/03/21.(details) → reported to comp-dc-operations@belle2.org
    All jobs failed and pilots are aborted from 2017/03/21 10:00 UTC  BIIDCO-141 - Getting issue details... STATUS
  • Health checker info. : "Aborted pilot jobs" has been found since 15:20:00 UTC on 2016/11/16.(details)  These aborted pilots jobs disappeared a few hours later.
  • Health checker info. : "Short pilot jobs" has been found since 07:20:00 UTC on 2017/08/14.(details) → BIIDCO-258 - Getting issue details... STATUS

DIRAC.Nagoya.jp

DIRAC.Nara-WU.jp

  • Decommissioned site: Since this still uses SL5, DIRAC pilot cannot be executed there.

DIRAC.NDU.jp

  • Health checker info. : "Short pilot jobs" has been found since 04:20:00 UTC on 2017/08/17.
  • Health checker info. : "Short pilot jobs" has been found since 12:20:00 UTC on 2017/08/16.(details)
  • Health checker info. : "Short pilot jobs" has been found since 06:20:00 UTC on 2017/08/14.(details) → BIIDCO-259 - Getting issue details... STATUS

DIRAC.Niigata.jp

DIRAC.Osaka-CU.jp

  • Health checker info. : "Short pilot jobs" has been found since 13:20:00 UTC on 2017/08/20.
  • Health checker info. : "Not enough disk space on " has been found since 04:20:00 UTC on 2017/08/19.
  • Job Submission failure is observed since since 19:21:00 UTC on 2017/07/29.
     Issued a JIRA ticket: https://agira.desy.de/projects/BIIDCO/issues/BIIDCO-222
  • Job submission check : Pilot submission failure has been found since 20:25:00 UTC on 2017/07/06. (details)
    → Issued a JIRA ticket:  BIIDCO-168 - Getting issue details... STATUS . Network switch got broken and it's repaired.

DIRAC.PNNL.us

  • Job submission check : Pilot submission failure has been found at 04:28:00 UTC on 2017/07/12. Issued a JIRA ticket BIDCO-237.

DIRAC.PNNL2.us

  • Jobs fail in downloading input files. User jobs to be restricted. BIIDCO-218 - Getting issue details... STATUS  
  • Production jobs failing due to slow conditions DB access BIIDCO-255 - Getting issue details... STATUS

DIRAC.PNNL-CASCADE.us

  • Seeing no jobs (no plot) is not a problem

DIRAC.PNNL-PIC.us

  • Seeing no jobs (no plot) is not a problem

DIRAC.RCNP.jp

  • Health checker info. : "Short pilot jobs" has been found since 01:20:00 UTC on 2017/08/17.

DIRAC.SSU.kr

  • Health checker info. : "Short pilot jobs" has been found since 02:20:00 UTC on 2017/08/17.
  • Health checker info. : "Short pilot jobs" has been found at 14:20:00 UTC on 2017/08/16.(details)
  • Health checker info. : "Short pilot jobs" has been found since 01:20:00 UTC on 2017/08/14.(details) → BIIDCO-260 - Getting issue details... STATUS

DIRAC.TIFR.in

  • Whole production jobs failed by file upload failure since 2017-07-06 BIIDCO-205 - Getting issue details... STATUS

DIRAC.TMU.jp


  • Job submission check : Pilot submission failure has been found since 10:28:00 UTC on 2017/08/19.
  • Job submission check : Pilot submission failure has been found since 08:24:00 UTC on 2017/08/18. (details) → since it's lasting from several hours, reported via mail (JIRA is not working now)
  • Health checker info. : "Aborted pilot jobs" has been found at 06:20:00 UTC on 2017/08/18.(details
  • Health checker info. : "Aborted pilot jobs" has been found since 05:20:00 UTC on 2017/08/11.( BIIDCO-249 - Getting issue details... STATUS )
  • Health checker info. : "Aborted pilot jobs" has been found since 08:20:00 UTC on 2017/08/10.

DIRAC.Tokyo.jp

  • no jobs ran since since 2017-07-08 19:00 UTC BIIDCO-194 - Getting issue details... STATUS
    → Jobs and pilots seems run since 2017-07-21

DIRAC.UAS.mx

  • Health checker info. : "Short pilot jobs" has been found since 06:20:00 UTC on 2017/08/17.
  • Health checker info. : "Short pilot jobs" has been found at 14:20:00 UTC on 2017/08/16.(details)
  • Health checker info. : "Short pilot jobs" has been found since 23:20:00 UTC on 2017/08/14.(details) → BIIDCO-266 - Getting issue details... STATUS

DIRAC.UVic.ca

  •  "Short pilot jobs" has been found since 08:20:00 UTC on 2017/08/19.

DIRAC.Yamagata.jp

DIRAC.Yonsei.kr

  • Health checker info. : "Not enough disk space on " has been found since 22:20:00 UTC on 2017/08/16.
    JIRA ticket has been submitted. BIIDCO-276 - Getting issue details... STATUS
  • Job submission check : Pilot submission failure has been found since 03:24:00 UTC on 2017/08/16. (details)
  • No jobs run again since 2017-07-16 since 2017-07-15 ~23:30 UTC BIIDCO-192 - Getting issue details... STATUS
    → Same reason of NFS service failure at site and recovered by restarting (2017-07-17 16:00 JST)
  • Job submission check : Pilot submission failure has been found since 02:34:00 UTC on 2017/08/15. (details) → BIIDCO-267 - Getting issue details... STATUS

LCG.CESNET.cz

LCG.CNAF.it

  • .

LCG.Cosenza.it

  • GGUS ticket : "INFN-COSENZA: WN disk full"(130126) has been submited at 00:01:12 UTC on 2017/08/18.
  • Health checker info. : "Not enough disk space on recas-wn-04" has been found since 17:20:00 UTC on 2017/08/17. 
    JIRA ticket has been submitted.  BIIDCO-274 - Getting issue details... STATUS
  • Health checker info. : "Not enough disk space on recas-wn-04" has been found since 22:20:00 UTC on 2017/08/16.

LCG.CYFRONET.pl

LCG.DESY.de

  •  Job submission check : Pilot submission failure has been found since 13:23:00 UTC on 2017/08/21.
  • BIIDCO-70 - Getting issue details... STATUS

LCG.Frascati.it

  • Health checker info. : "Short pilot jobs" has been found since 13:20:00 UTC on 2017/08/17.
  • Health checker info. : "Short pilot jobs" has been found since 05:20:00 UTC on 2017/08/11.(details) ( BIIDCO-251 - Getting issue details... STATUS )
  • Health checker info. : "Short pilot jobs" has been found since 12:20:00 UTC on 2017/07/16.(details)

LCG.HEPHY.at


LCG.KEK.jp

  • Health checker info. : "Failed pilot jobs" has been found at 14:20:00 UTC on 2017/08/20.
  • Health checker info. : "Failed pilot jobs" has been found since 05:20:00 UTC on 2017/08/19.(details)
  • Health checker info. : "Failed pilot jobs" has been found at 14:20:00 UTC on 2017/08/18.(details)

LCG.KEK2.jp

  • Health checker info. : "Failed pilot jobs" has been found at 14:20:00 UTC on 2017/08/20.
  • Health checker info. : "Failed pilot jobs" has been found at 14:20:00 UTC on 2017/08/18.(details)
  • Health checker info. : "Failed pilot jobs" has been found at 06:20:00 UTC on 2017/08/18.(details)

LCG.KISTI.kr

Short pilot jobs" has been found since 10:20:00 UTC on 2017/07/15

Health checker info. : "Short pilot jobs" has been found since 07:20:00 UTC on 2017/07/18.(details)

LCG.KIT.de

  • BIIDCO-131 - Getting issue details... STATUS
  • The maximum number of job is set to be zero (for job drain) 2017/03/07.

LCG.KMI.jp

  • "Failed pilot jobs" has been found at 14:20:00 UTC on 2017/08/19.
  • Health checker info. : "Failed pilot jobs" has been found at 21:20:00 UTC on 2016/11/23.(details)
  • BIIDCO-74 - Getting issue details... STATUS

LCG.Legnaro.it

LCG.McGill.ca

  • Health checker info. : "Failed pilot jobs" has been found since 03:20:00 UTC on 2017/07/15. (details)
    Solved and verified (2017-08-17): GGUS ticket : "CA-MCGILL-CLUMEQ-T2: All pilot jobs failed with error"(129600) has been submitted at 18:43:03 UTC on 2017/07/17
  • Health checker info. : "BLAH ERROR" has been found since 10:20:00 UTC on 2017/07/07.(details)

LCG.Melbourne.au

  • Banned for CE replacement   -- BIIDCO-162 - Getting issue details... STATUS

LCG.Napoli.it

  • Health checker info. : "BLAH ERROR" has been found since 11:20:00 UTC on 2017/08/20.
  • Downtime info.: ce.scope.unina.it is now in downtime. (GOCDB 23765)
  • Health checker info. : "BLAH ERROR" has been found since 02:20:00 UTC on 2017/08/12.(details)
  • Health checker info. : "BLAH ERROR" has been found since 05:20:00 UTC on 2017/08/11.
  • Health checker info. : "BLAH ERROR" has been found at 02:20:00 UTC on 2017/08/09
    → Downtime from 26-Jul-17 09:40:00 to 11-Aug-17 18:00:00 UTC (ce.scope.unina.it, grisuce.scope.unina.it)
  • Health checker info. : "BLAH ERROR" has been found at 14:20:00 UTC on 2017/07/14.(details)

LCG.NTU.tw

  • Health checker info. : "Failed to install DIRAC on node17" has been found since 02:20:00 UTC on 2017/08/09.
    → GGUS ticket "DIRAC installation failure at node17" (130004) was submitted at 2017-08-09 16:44 UTC
  • Job submission check : Pilot submission failure has been found since 10:23:00 UTC on 2017/06/15. (details)
  • Solved and verified (2017-07-12) : GGUS ticket "can not submit jobs on belle2grid2.cc.ntu.edu.tw" (129330) was submitted at 2017-07-03 22:41:00 UTC.
  • Health checker info. : "Short pilot jobs" has been found at 10:20:00 UTC on 2017/08/14.(details) → BIIDCO-262 - Getting issue details... STATUS

LCG.Pisa.it

  • GGUS ticket : "File Transfer failure to stormfe1.pi.infn.it"(129865) has been submited at 07:01:38 UTC on 2017/08/01. 
  • Health checker info. : "Failed pilot jobs" has been found since 16:20:00 UTC on 2017/08/16.(details)
  • Health checker info. : "BLAH ERROR" has been found since 05:20:00 UTC on 2017/08/08
    → Downtime from 2017-08-07 23:59 to 2017-08-09 23:59 (UTC)
  • Solved and closed (2017-07-04) : GGUS ticket : "Jobs submitted to gridce0.pi.infn.it are finished immediately"(122842) has been submited at 08:12:14 UTC on 2016/07/13.
  • LCG.Pisa.it - Pilots fail after running long time BIIDCO-75 - Getting issue details... STATUS

  • Job submission check : Pilot submission failure → BIIDCO-263 - Getting issue details... STATUS

LCG.Roma3.it

  • Roma3 commissioning BIIDCO-111 - Getting issue details... STATUS

LCG.Torino.it

  • Job submission check : Pilot submission failure has been found at 14:14:00 UTC on 2017/08/17.
  • GGUS ticket : 
    1. "INFN-TORINO: Failing transfers with DESTINATION srm://se-srm-00.to.infn.it"(130083) has been submited at 03:36:13 UTC on 2017/08/16. 
    2. "INFN-TORINO: Failed job submission to t2-ce-01.to.infn.it"(130043) has been submited at 02:28:44 UTC on 2017/08/12. 
  • Health checker info. : "Failed pilot jobs" has been found at 23:20:00 UTC on 2017/08/16.(details)
  • Job submission check : Pilot submission failure has been found at 22:29:00 UTC on 2017/08/16. (details)
  • Health checker info. : "Failed pilot jobs" has been found since 05:20:00 UTC on 2017/08/11. ( BIIDCO-252 - Getting issue details... STATUS )
  • Job submission check : Pilot submission failure has been found at 06:22:00 UTC on 2017/08/08.
  • Job submission check : Pilot submission failure has been found at 09:28:00 UTC on 2017/08/14. (details) → BIIDCO-264 - Getting issue details... STATUS

LCG.ULAKBIM.tr

OSG.UMiss.us

  • no enough space error: Application finished with errors  BIIDCO-241 - Getting issue details... STATUS

SSH.KMI.jp

Links


Twiki settings:

  • Set INTERWIKIPLUGIN_RULESTOPIC = InterWikis
  • Set EDITMETHOD = ra
  • No labels