Satish, Paul, Joseph, Adam, David, Paola, Qiulan, Gavin
Computing Issues (all)¶
Vito pointed out over email that the FTS instance on novasamgpvm04 was scanning the incorrect directories. This was eventually traced to the fact that the start fits script run at reboot was using the wrong arguments, and hence picking up the configuration for novasamgpvm03.
Software Tags (David)¶
The newest tag, S15-07-30, has latest version of art and nutools. David is taking his quals in about ten days, so his availability will be low as he prepares.
David has put together the scripts to do hot-fixes according to Satish’s proposal of 21 July.
- ND CRY —This was done as of last Friday.
- ND Add'l First Analysis MC — 5k/20k are done. Paul is waiting for FTS to catch up before submitting the next batch of jobs. He will bump up the number of jobs to increase throughput. Jobs are running offsite, so there shouldn’t be any limitation on the number of slots available. In principle the FTS load could be a bottleneck, but under the current circumstances, that is unlikely to be an issue.
The ND "top-up” data is mostly done. Joseph has created definitions and advertised them to the group. A few jobs are stuck. He will wait for them to finish, and submit draining jobs.
The datasets are messy because they end on subrun boundaries rather than run boundaries.
There is also confusion because of version skew of the raw2root processing. All files up through March 20 were raw2rooted using S15-03-11 (most were also processed with FA14-10-03). After March 20, only one file was processed with S15-03-11 and all but one were processed with FA14-10-03 (this was not the one processed with S15-03-11).
There has been followup with Paola, but these changes occurred before she started running raw2root. Followup with Jeny is required, and inspection of old tickets. It was remarked that Satish cannot view the old tickets as did not submit them and was never made a watcher. The generally restrictive policy is often a hinderance. Paola will review old requests.
There was a suggestion that OPOS should implement logging of job submission. Paola also asked if completion of keep-up is monitored by NOvA. The answer is yes, by shifters.
Both near detector top-up data and RHC Monte Carlo were started by Chris shortly after the meeting.
raw2root keepup (Paola)¶
This is running smoothly. The only issue was the large DDSnews files. But constraints have been implemented to avoid processing them, per instructions from Jan.
Calibration keepup (Qiulan)¶
On Monday and Tuesday, jobs failed with no mask in DB. That has been corrected, but a new error has popped up in RunHistory. Update Satish followed up with Jon Paley, and the issue should have been resolved.
Reco keepup (Vito)¶
Keepup has resumed. Vito is also filling in missing files not processed in the time range April 11 - June 20 with S15-05-07a. After June 20, keep is proceeding w/ S15-07-30. Because of the issues w/ novasamgpvm04, efforts are stalled because we cannot tell which files actually made it to SAM. The post-June 20 keepup is almost done.
ND CRY calibration (Qiulan):¶
These jobs were submitted on Monday. 100 jobs failed with ifdh cp. Two files failed with an art exception an error writing output files, evidently because of no disk space issues. Qiulan will file tickets to get these issues resolved.
Workshop Planning (Satish)¶The workshop will have six or seven sessions:
- A review of the first-analysis campaign focusing on the major problems we faced: database problems,
- Discussion of our code-management strategy (hot fixes and back ports), migrating some of our scripts to separate UPS products
- A discussion on the requests made of production by the collaboration: how much of what was requested was needed. Can we control the number of alternate samples?
- Workflow and tools: can we automate or simplify our workflow? Can we unify or simplify any of our tools? Development of a production validation strategy.
- Grid submission: Finalizing job priorities. Can we make improvements to our submission tools. What about running on the amazon cloud?
- Improving our throughput on FTS and SAM.
Gavin also raised the point that we should devote some time to making production efforts easier for standard users, and integrating sam4users into our general purpose scripts. The original intent was to include this in the workflow and tools section, but it ay be worth breaking this off into a separate section.