Contents

  •  Click here to expand...

 

Production Plans

 


Production Status

MC8

Official production started at 01:30 JST on Feb 17 with a 1 ab-1 sample of phase 3 - Y(4S) generic MC

Submitted 72 signal samples for a total of about 162 million events (Feb 25, ~02:00 JST)

Submitted additional signal samples equivalent to 820 million events (Mar 1, ~00:00 JST)

Submitted first part of 0.5 ab-1 Y(5S) generic sample: bsbs only = 29.24 million events (Mar 1, ~02:00 JST)

Submitted the rest of 0.5 ab-1 Y(5S) generic sample (Mar 1, ~22:30 JST)

Submitted the rest of the phase 3 Y(4S) signal samples (Mar 8)

Submitted the phase 2 signal samples (Mar 8)

Submitted a 200 fb-1 Y(4S) generic sample with a total of just over 1 billion events (Mar 11, ~01:30 JST)

Submitted phase 3 bottomonium samples (Mar 14) - resubmitted some problematic productions, Miyake-san is fixing others (Mar 15)

Submitted phase 2 bottomonium samples (Mar 15)

Submitted 300 fb-1 phase 3 Y(3S) generic sample and several small signal samples (Mar 18, ~02:00 JST)

Submitted a large generator-level skim sample (96 x 10^9 generated events). These should finish very quickly, but will make a big spike in the production progress plot (Mar 26, ~02:00 JST)

One last signal sample (7400 jobs) was still in the pipeline (requested before the last DP update). Submitted on Mar 28 at ~22:00 JST.

Submitted 16 signal samples (relatively small) and 1 ab-1 ccbar generic sample (May 8, ~23:00 JST)


MC7

Official production started as scheduled at 00:00 JST on Nov. 1.

A full list of samples is given on the data production page

A total of 100,000 jobs of 10k events each have been submitted as of Nov. 4

The following productions have been stopped and the corresponding jobs killed due to improper ROOT output (Nov. 10):

  • 517 - nonbsbs phase 2 Y(6S)
  • 518 - bsbs phase 2 Y(6S)

A total of 139,972 jobs of 10k events have been submitted as of Nov. 10

Phase 2 background samples have been distributed. 

  • Distribution to the Grid sites: BIIDCO-31 - Getting issue details... STATUS
  • Distribution to the non-Grid sites: BIIDCO-32 - Getting issue details... STATUS
  • When they are ready, we will finish production of phase 2 samples. This will complete the official requests for MC7. Thereafter, we will submit additional generic samples and perhaps include some additional requests from the physics group.

Generic phase 2 samples at Y(6S) and Y(4S) as well as some signal samples with backgrounds have been submitted (~23:50 JST on Nov. 11 and ~04:50 JST on Nov. 12). In total 17,776 Y(6S) + 29,608 Y(4S) = 47,384 jobs were submitted.

All requested MC samples for MC7 have been submitted (Nov 14)

Also submitted Phase III Y(3S) requests, which total 23,850 jobs of 10k events (Nov 14 at 04:30 JST). ((Running total is now 215,332 jobs))

Completed database access tests with 2k and 5k concurrent jobs (Nov 16 at ~10:00 JST)

Submitted Phase III Y(4S) generic samples (1 ab-1), which total 575,300 jobs of 10k events (Nov 16 at ~23:40 JST)

Submitted additional jobs including another 1 ab-1 of phase 3 generic Y(4S) samples, which total 636,000 jobs of about 0.5/0.5 of 10k/5k events. This is roughly equal to the number of jobs previously submitted, bringing the total number to over 1.2 million jobs. (December 8 at ~00:00 JST)

Submitted additional generic samples (1 ab-1 of mixed and charged events), totalling 160,400 additional jobs. (December 12 at ~04:00 JST)

Submitted additional signal samples (12.7k jobs) and a new ccbar sample with modified parameters (159.4k jobs). (December 21 at ~02:00 JST)

The system keeps working with the jobs already submitted until they are exhausted. No new submission of additional samples until January 9, after the downtime of KEK

 

New submission started January 9 at 0:00 JST with phase 3 - Y(4S) generic samples:

  • mixed (BGx1: 42770 jobs, BGx0: 21380) ((~0:00 JST))
  • charged (BGx1: 45230 jobs, BGx0: 22620) ((~6:00 JST))
  • uubar, ddbar, ssbar, ccbar, taupair: 538,060 jobs ((~01:30 JST))

Submitted new ccbar samples with new parameters: 159,480 jobs (~22:15 JST on Jan 27)

Test production for MC8, including 10 jobs of 10k events each with BG for generic samples (70 jobs total) started Feb. 1.

Submitted additional MC7 signal samples: 11,200 jobs (~02:00 JST on Feb 5)


Central Services

Dirac

CPU load of b2dcsv ... is too high 2017/06/12 9:19 UT

  • Memory consumption increase and fluctuating at one server (b2dchsv05.cc.kek.jp) 2017/03/15 20:30 JST.
    → Experts are still under investigating 2017/03/17 00:14 JST
    → Reset has performed every one hour for b2dchsv05.cc.kek.jp in order to avoid reaching memory limit 2017/03/17 02:27 JST
  • The memory issue above seems gone. Still the root cause is not identified, though. 2017-03-24 09:28 UTC

DDM

Monitor


File Transfers and Replication Status

See also DDM for related issues

FTS

  • The DNS name kek2-fts points to kek2-fts02. DIRAC config points to kek2-fts  
  • The upgraded FTS is being used (kek2-fts02) now.  
  • Submission of file transfers keep failing. Reproduced in manual submission.
    • ggus:128022 KEK FTS: error verifying signature on certificate
    • FTS server to be upgraded.
    • stand-by FTS (kek2-fts02) has been upgraded and being tested.

Replication Status

SEs

SE Common Issues

Destination SE: CESNET-TMP-SE (dpm1.egee.cesnet.cz)



Destination SE: CNAF-TMP-SE (storm-fe-archive.cr.cnaf.infn.it)

  • Not enough free space BIIDCO-137 - Getting issue details... STATUS

Destination SE: DESY-TMP-SE (dcache-se-desy.desy.de)

  • Not enough free space BIIDCO-107 - Getting issue details... STATUS

Destination SE: KEK2-TMP-SE (kek2-se01.cc.kek.jp)

  • Still banned for removal due to the issue in the back-end HSM
    BIIDCO-41 - Getting issue details... STATUS

Destination SE: KISTI-TMP-SE (belle-se-head.sdfarm.kr)

Destination SE: KIT-TMP-SE (gridka-dcache.fzk.de)

  • Not enough free space BIIDCO-134 - Getting issue details... STATUS

Destination SE: KMI-TMP-SE (nsrmfe01.hepl.phys.nagoya-u.ac.jp)

  • Not enough free space BIIDCO-136 - Getting issue details... STATUS


Destination SE: Napoli-TMP-SE (belle-dpm-01.na.infn.it)

  • Not enough free space  BIIDCO-146 - Getting issue details... STATUS

Destination SE: PNNL-TMP-SE (se.hep.pnnl.gov)

  •  

Destination SE: SIGNET-TMP-SE (dcache.ijs.si)

Other SEs

CYFRONET-TMP-SE (dpm.cyf-kr.edu.pl)

HEPHY-TMP-SE (hephyse.oeaw.ac.at)

McGill SE  (storm02.clumeq.mcgill.ca)

NTU-TMP-SE (bgrid3.phys.ntu.edu.tw)

Pisa-TMP-SE (stormfe1.pi.infn.it)


Torino-TMP-SE (se-srm-00.to.infn.it)

ULAKBIM-TMP-SE (torik1.ulakbim.gov.tr)

UVic-TMP-SE(charon01.westgrid.ca)

Sites

Sites Common Issues

ARC.DESY.de

ARC.KIT.de

ARC.LMU.de

  • This is a test site. Do not need to report any issue.

ARC.LMU2.de

  • Health checker info. : "Aborted pilot jobs" has been found since 22:20:00 UTC on 2017/03/11.(details)
    → Experts works in progress with JIRA ticket 2017/03/10 BIIDCO-127 - Getting issue details... STATUS (Jobs looks running but large amount of pilots are aborted

ARC.MPPMU.de

  • Job submission check : Pilot submission failure has been found at 06:31:00 UTC on 2017/05/10. (details)

ARC.SIGNET.si

  • Job submission check : Pilot submission failure has been found since 05:26:00 UTC on 2017/03/10. (details)

  • BIIDCO-126 - Getting issue details... STATUS


CLOUD.CC1_Krakow.pl

  • Seeing no jobs (no plot) is not a problem

DIRAC.Beihang.cn

  • Banned. BIIDCO-43 - Getting issue details... STATUS
    • All the upload trials are failing against all the SEs configured: OutputSE (KMI-TMP-SE, PNNL-TMP-SE), Fail-over SEs(DESY-TMP-SE, Napoli-TMP-SE, PNNL-TMP-SE, KIT-TMP-SE)
  • Large % of failed jobs in DIRAC status plot (Added 2016-11-03 22:45:00 UTC) 

DIRAC.BINP.ru

DIRAC.CINVESTAV.mx

DIRAC.DESY.de

  • Test site. Not in use in MC production

DIRAC.IITG.in

DIRAC.LMU.de

  • Not in use in MC production BIIDCO-26 - Getting issue details... STATUS
    • Banned for now.

DIRAC.MIPT.ru

  • Health checker info. : "Aborted pilot jobs" has been found since 08:20:00 UTC on 2017/03/21.(details) → reported to comp-dc-operations@belle2.org
    All jobs failed and pilots are aborted from 2017/03/21 10:00 UTC  BIIDCO-141 - Getting issue details... STATUS
  • Health checker info. : "Aborted pilot jobs" has been found since 15:20:00 UTC on 2016/11/16.(details)  These aborted pilots jobs disappeared a few hours later.

DIRAC.Nagoya.jp

DIRAC.Nara-WU.jp

  • Decommissioned site: Since this still uses SL5, DIRAC pilot cannot be executed there.

DIRAC.NDU.jp

DIRAC.Niigata.jp

DIRAC.Osaka-CU.jp

DIRAC.PNNL.us

DIRAC.PNNL2.us

DIRAC.PNNL-CASCADE.us

  • Seeing no jobs (no plot) is not a problem

DIRAC.PNNL-PIC.us

  • Seeing no jobs (no plot) is not a problem

DIRAC.RCNP.jp

DIRAC.SSU.kr

DIRAC.TIFR.in

DIRAC.TMU.jp

DIRAC.Tokyo.jp

DIRAC.UAS.mx

DIRAC.UVic.ca

DIRAC.Yamagata.jp

DIRAC.Yonsei.kr

LCG.CESNET.cz

LCG.CNAF.it

  • ce01-lcg.cr.cnaf.infn.it being decommissioned  BIIDCO-157 - Getting issue details... STATUS

LCG.Cosenza.it

LCG.CYFRONET.pl

  • BLAH ERROR has notified to site admin with GGUS ticket  (2017-02-28 13:00:00)

LCG.DESY.de

LCG.Frascati.it

LCG.HEPHY.at

LCG.KEK.jp


LCG.KEK2.jp

  • Job submission check : Pilot submission failure has been found at 07:27:00 UTC on 2017/03/08
  • The same cause as for LCG.KEK.jp BIIDCO-125 - Getting issue details... STATUS

LCG.KISTI.kr

LCG.KIT.de

  • BIIDCO-131 - Getting issue details... STATUS
  • The maximum number of job is set to be zero (for job drain) 2017/03/07.

LCG.KMI.jp

  • Health checker info. : "Failed pilot jobs" has been found at 21:20:00 UTC on 2016/11/23.(details)
  • BIIDCO-74 - Getting issue details... STATUS

LCG.Legnaro.it

LCG.McGill.ca

LCG.Melbourne.au

LCG.Napoli.it

LCG.NTU.tw

LCG.Pisa.it

  • GGUS ticket : "Jobs submitted to gridce0.pi.infn.it are finished immediately"(122842) has been submited at 08:12:14 UTC on 2016/07/13.
  • LCG.Pisa.it - Pilots fail after running long time BIIDCO-75 - Getting issue details... STATUS

LCG.Roma3.it

  • Roma3 commissioning BIIDCO-111 - Getting issue details... STATUS

LCG.Torino.it

LCG.ULAKBIM.tr


OSG.UMiss.us

  • Health checker info. : "Short pilot jobs" has been found at 23:20:00 UTC on 2017/03/04.(details)

SSH.KMI.jp

Links


Twiki settings:

  • Set INTERWIKIPLUGIN_RULESTOPIC = InterWikis
  • Set EDITMETHOD = ra
  • No labels