Chris, Joseph, Satish, Justin, Alex, Tapasi, Paul R, Paul S, Paola
Satish will be on a short vacation this Friday through Monday. He will be in sporadic email contact during this time.
Alex commented on several computing infrastructure issues.
There was a GUMS server upgrade, which allowed analysis jobs to run as the submitting user, rather than as novena. In principle, this did not impact production, but it did cause some downtime.
There have been intermittent issues with ifdh.
There was an enstore downtime today because of problems in the room housing the tape robots.
On Tuesday there will be a dCache downtime. The reason is that the disk that houses the pnfs database is running out of space and needs to be expanded. In principle, IFDH should be able to handle this transparently, but older versions of ifdh that are set up in the tags used for production may not be recent enough for this statement to apply.
ND Top-up Data¶
This is now done.
Additional FA-style ND Genie MC¶
Paul R still has ~100 files left to generate, but has recently been having difficulty getting jobs to run. Once they begin, they are running into CVMFS timeout issues. Paola has offered to help Paul debug the issue.
Enrique is not present to comment on the status of the reco process, but from the production page, it appears he is nearly caught up with inputs.
Chris has not been running LEM for some time, but will resume when the inputs are complete not doing lem for past two weeks, Satish to follow up when last files are through.
Tapasi’s caf/mix files jobs are mostly finished, but the files are not visible in the SAM dataset definitions. She will explore why. Update This was because of a miscommunication about mix/caf releases. Tapasi processed them in S15-05-22a, but the SAM datasets were expecting them to be in S15-05-22. After checking release differences, Satish decided that it was ok not to go back and reprocess in S15-05-22.
ND Top-up MC Sample¶
The FCL files have been generated, but Paul is running into the same issue with the generation jobs that he is encountering elsewhere. He did manage to get a few hundred files though before this problem arose. Paul will circulate dataset definitions so that Enrique can generate new definitions.
FA Respin for Sterile Groups¶
The initial plan was to generate new FCL files that dropped the problematic lempresel objects at the input of the reprocessing jobs. There has been some confusion about whether it was technically possible, so Joe had been investigating a separate set of jobs to just do the dropping. It was again re-iterated, that this should be possible, so Joe will run interactive tests to put the question to bed. We will decide how to proceed from there.
Paul has submitted some test jobs without the lempresel dropping, but is not convinced they are working yet. These might be related to the other job submission woes he has bee having. Before running at amazon, he also needs accounts set up. That is in progress.
New ND Data/MC with new Simulation and Calibration¶
The tag for these requests is ready — S15-09-28. There was some uncertainty on the scale of the requests, so the go-ahead has not yet been given. However in the case of ND data, this can be done right away, and Satish will give the go-ahead when he renews the calibration processing ticket tomorrow. Update Alex and Satish have resolved the open questions about the scale of the requests, so we are good to go.
There have been no new raw files since Sep 29. This is because of a DAQ problem (authentication issues with data-disk-04; the problem is being addressed). There are 18 files listed as not being SAM available, because of FTS errors with sam_metadata_dumper. The error is not known to anyone in the production group, so Paola will follow up with data handling. Update This issue is now understood to be a transient issue, and after retrying it has been resolved.
We have a number of corrupt files. Satish will work on getting together the code to mark such files has bad.
There have been no issues since Friday. Lately there are no new raw files, so no inputs (see above).
All requested files have been processed.The number of ND file processed was
- ND NuMI: 5654
- ND cosmics: 7907
- ND others: 28539
Three corrupt files (all DDT files) were not processed because they are corrupt.The number of FD file processed was
- FD NuMI: no file to process
- FD cosmics: 2
- FD others: 653641
Two FD files failed. One was corrupt. The other failed with a bad subrun number, indicating that is also probably corrupt.
SW/Tag Reports (Paul S)¶
No problems have been observed with the nightly builds. Last Monday Paul cut a new tag (S15-09-28). He is expecting hot fixes based on this tag coming in soon (for reco processing) A new tag is scheduled for Friday. Paul should make sure that fife_utils uses the version declared “current" in development. Satish suggested that we do an audit of the external packages to determine if any other packages should use the “current” version rather than a fixed version.
Short reports on on-going projects:
Joe hasn’t had a chance to look at this, as he has been distracted by the lempresel issues.
Code audit on forced crashes¶
No updates, but Sijith is working with Jon Paley on this
Offsite Running (Enrique)¶