- Click here to expand...
- MC8 started on February 17, 2017
- Phase III generic samples
- Phase III signal samples
- Phase II signal samples
- MC7 ended February 8, 2017
- MC7 started on November 1, 2016
- Phase III signal MC samples with 12th background MC campaign samples
- Phase III Y(6S) production with 12th background MC campaign samples
- Phase II Y(6S) production with new, phase II background samples
- Phase II Y(4S) production with new, phase II background samples
- Phase III Y(3S) production with 12th background MC campaign samples
- Additional generic samples at phase III with and without backgrounds
- MC7 phase 2 started January 9, 2017
- Phase III Y(4S) generic samples
- Phase III Y(4S) ccbar sample with new parameters
- MC8 test production started February 1, 2017
- In parallel with MC8 production, ~10,000 analysis test jobs were submitted on Feb. 2nd. These jobs will finish within next couple of hours from now (as of 21:00 JST)
- MC8 restarted May 8
- Phase III signal MC samples (mainly for validation use)
- Phase III Y(4S) ccbar generic samples with new Pythia parameters
Official production started at 01:30 JST on Feb 17 with a 1 ab-1 sample of phase 3 - Y(4S) generic MC
Submitted 72 signal samples for a total of about 162 million events (Feb 25, ~02:00 JST)
Submitted additional signal samples equivalent to 820 million events (Mar 1, ~00:00 JST)
Submitted first part of 0.5 ab-1 Y(5S) generic sample: bsbs only = 29.24 million events (Mar 1, ~02:00 JST)
Submitted the rest of 0.5 ab-1 Y(5S) generic sample (Mar 1, ~22:30 JST)
Submitted the rest of the phase 3 Y(4S) signal samples (Mar 8)
Submitted the phase 2 signal samples (Mar 8)
Submitted a 200 fb-1 Y(4S) generic sample with a total of just over 1 billion events (Mar 11, ~01:30 JST)
Submitted phase 3 bottomonium samples (Mar 14) - resubmitted some problematic productions, Miyake-san is fixing others (Mar 15)
Submitted phase 2 bottomonium samples (Mar 15)
Submitted 300 fb-1 phase 3 Y(3S) generic sample and several small signal samples (Mar 18, ~02:00 JST)
Submitted a large generator-level skim sample (96 x 10^9 generated events). These should finish very quickly, but will make a big spike in the production progress plot (Mar 26, ~02:00 JST)
One last signal sample (7400 jobs) was still in the pipeline (requested before the last DP update). Submitted on Mar 28 at ~22:00 JST.
Submitted 16 signal samples (relatively small) and 1 ab-1 ccbar generic sample (May 8, ~23:00 JST)
Official production started as scheduled at 00:00 JST on Nov. 1.
A full list of samples is given on the data production page
A total of 100,000 jobs of 10k events each have been submitted as of Nov. 4
The following productions have been stopped and the corresponding jobs killed due to improper ROOT output (Nov. 10):
- 517 - nonbsbs phase 2 Y(6S)
- 518 - bsbs phase 2 Y(6S)
A total of 139,972 jobs of 10k events have been submitted as of Nov. 10
Phase 2 background samples have been distributed.
- Distribution to the Grid sites: - BIIDCO-31Getting issue details... STATUS
- Distribution to the non-Grid sites: - BIIDCO-32Getting issue details... STATUS
- When they are ready, we will finish production of phase 2 samples. This will complete the official requests for MC7. Thereafter, we will submit additional generic samples and perhaps include some additional requests from the physics group.
Generic phase 2 samples at Y(6S) and Y(4S) as well as some signal samples with backgrounds have been submitted (~23:50 JST on Nov. 11 and ~04:50 JST on Nov. 12). In total 17,776 Y(6S) + 29,608 Y(4S) = 47,384 jobs were submitted.
All requested MC samples for MC7 have been submitted (Nov 14)
Also submitted Phase III Y(3S) requests, which total 23,850 jobs of 10k events (Nov 14 at 04:30 JST). ((Running total is now 215,332 jobs))
Completed database access tests with 2k and 5k concurrent jobs (Nov 16 at ~10:00 JST)
Submitted Phase III Y(4S) generic samples (1 ab-1), which total 575,300 jobs of 10k events (Nov 16 at ~23:40 JST)
Submitted additional jobs including another 1 ab-1 of phase 3 generic Y(4S) samples, which total 636,000 jobs of about 0.5/0.5 of 10k/5k events. This is roughly equal to the number of jobs previously submitted, bringing the total number to over 1.2 million jobs. (December 8 at ~00:00 JST)
Submitted additional generic samples (1 ab-1 of mixed and charged events), totalling 160,400 additional jobs. (December 12 at ~04:00 JST)
Submitted additional signal samples (12.7k jobs) and a new ccbar sample with modified parameters (159.4k jobs). (December 21 at ~02:00 JST)
The system keeps working with the jobs already submitted until they are exhausted. No new submission of additional samples until January 9, after the downtime of KEK
- Jobs have been drained by 2017-01-01
New submission started January 9 at 0:00 JST with phase 3 - Y(4S) generic samples:
- mixed (BGx1: 42770 jobs, BGx0: 21380) ((~0:00 JST))
- charged (BGx1: 45230 jobs, BGx0: 22620) ((~6:00 JST))
- uubar, ddbar, ssbar, ccbar, taupair: 538,060 jobs ((~01:30 JST))
Submitted new ccbar samples with new parameters: 159,480 jobs (~22:15 JST on Jan 27)
Test production for MC8, including 10 jobs of 10k events each with BG for generic samples (70 jobs total) started Feb. 1.
Submitted additional MC7 signal samples: 11,200 jobs (~02:00 JST on Feb 5)
CPU load of b2dcsv ... is too high 2017/06/12 9:19 UT
- Memory consumption increase and fluctuating at one server (b2dchsv05.cc.kek.jp) 2017/03/15 20:30 JST.
→ Experts are still under investigating 2017/03/17 00:14 JST
→ Reset has performed every one hour for in order to avoid reaching memory limit 2017/03/17 02:27 JST
- The memory issue above seems gone. Still the root cause is not identified, though. 2017-03-24 09:28 UTC
File Transfers and Replication Status
See also DDM for related issues
- The DNS name kek2-fts points to kek2-fts02. DIRAC config points to kek2-fts
- The upgraded FTS is being used (kek2-fts02) now.
- Submission of file transfers keep failing. Reproduced in manual submission.
- ggus:128022 KEK FTS: error verifying signature on certificate
- FTS server to be upgraded.
- stand-by FTS (kek2-fts02) has been upgraded and being tested.
SE Common Issues
Destination SE: CESNET-TMP-SE (dpm1.egee.cesnet.cz)
Destination SE: CNAF-TMP-SE (storm-fe-archive.cr.cnaf.infn.it)
Destination SE: DESY-TMP-SE (dcache-se-desy.desy.de)
Destination SE: KEK2-TMP-SE (kek2-se01.cc.kek.jp)
- Still banned for removal due to the issue in the back-end HSM
- BIIDCO-41Getting issue details... STATUS
Destination SE: KISTI-TMP-SE (belle-se-head.sdfarm.kr)
Destination SE: KIT-TMP-SE (gridka-dcache.fzk.de)
Destination SE: KMI-TMP-SE (nsrmfe01.hepl.phys.nagoya-u.ac.jp)
Destination SE: Napoli-TMP-SE (belle-dpm-01.na.infn.it)
Destination SE: PNNL-TMP-SE (se.hep.pnnl.gov)
Destination SE: SIGNET-TMP-SE (dcache.ijs.si)
McGill SE (storm02.clumeq.mcgill.ca)
Sites Common Issues
- This is a test site. Do not need to report any issue.
- Health checker info. : "Aborted pilot jobs" has been found since 22:20:00 UTC on 2017/03/11.(details)
→ Experts works in progress with JIRA ticket 2017/03/10 - BIIDCO-127Getting issue details... STATUS (Jobs looks running but large amount of pilots are aborted
- Job submission check : Pilot submission failure has been found at 06:31:00 UTC on 2017/05/10. (details)
Job submission check : Pilot submission failure has been found since 05:26:00 UTC on 2017/03/10. (details)
- - BIIDCO-126Getting issue details... STATUS
- Seeing no jobs (no plot) is not a problem
BIIDCO-43Getting issue details...
- All the upload trials are failing against all the SEs configured: OutputSE (KMI-TMP-SE, PNNL-TMP-SE), Fail-over SEs(DESY-TMP-SE, Napoli-TMP-SE, PNNL-TMP-SE, KIT-TMP-SE)
- Large % of failed jobs in DIRAC status plot (Added 2016-11-03 22:45:00 UTC)
- Test site. Not in use in MC production
- Health checker info. : "Aborted pilot jobs" has been found since 08:20:00 UTC on 2017/03/21.(details) → reported to firstname.lastname@example.org
→ All jobs failed and pilots are aborted from 2017/03/21 10:00 UTC - BIIDCO-141Getting issue details... STATUS
- Health checker info. : "Aborted pilot jobs" has been found since 15:20:00 UTC on 2016/11/16.(details) These aborted pilots jobs disappeared a few hours later.
- Decommissioned site: Since this still uses SL5, DIRAC pilot cannot be executed there.
- Seeing no jobs (no plot) is not a problem
- Seeing no jobs (no plot) is not a problem
- BLAH ERROR has notified to site admin with GGUS ticket (2017-02-28 13:00:00)
- Job submission check : Pilot submission failure has been found at 07:27:00 UTC on 2017/03/08
- The same cause as for LCG.KEK.jp - BIIDCO-125Getting issue details... STATUS
- - BIIDCO-131Getting issue details... STATUS
- The maximum number of job is set to be zero (for job drain) 2017/03/07.
- Health checker info. : "Failed pilot jobs" has been found at 21:20:00 UTC on 2016/11/23.(details)
- - BIIDCO-74Getting issue details... STATUS
- GGUS ticket : "Jobs submitted to gridce0.pi.infn.it are finished immediately"(122842) has been submited at 08:12:14 UTC on 2016/07/13.
- LCG.Pisa.it - Pilots fail after running long time - BIIDCO-75Getting issue details... STATUS
- Health checker info. : "Short pilot jobs" has been found at 23:20:00 UTC on 2017/03/04.(details)
- Set INTERWIKIPLUGIN_RULESTOPIC = InterWikis
- Set EDITMETHOD = ra