Attending Satish, Joe, Chris, Amit, Susan, Paul, Felipe, Paola, Qiulan, Siva
SW Tags (Paul S)¶
Paul could not attend but sent an update by email.
The most recent tag produced last week, the S15-12-07, has the new version of art. He has not received any feedback about issues so far so it's probably all working at the moment. Nightly builds, for slf6 anyway, have been good so far this last week. If Paul find time this week he will start looking at implementing the latest version of nutools. Satish notes that S15-12-07 is used for reco keepup, so we have been exercising this tag.
Chris asked about the status of a test branch for ART 1.18, which uses root v6. It was agreed that we should make this happen, and Satish will contact Paul about this.
Offsite Status Report (Enrique)¶
Enrique cannot attend but posted these slides:
Mike Diesburg has set up an FTS instance at SMU. Tests have proceeded well so far, but larger scale tests are now needed. One oddity is that some jobs appear to reach 100% CPU usage in openssl, and take a long time to finish. The issue is not yet understood yet. Last week they were able to achieve 300 jobs running simultaneously, but the ultimate goal is to reach 625 jobs on a sustained basis.
Processing Status Reports¶
Raw2root keepup (Felipe)¶
This is running smoothly. There were a couple of crashed jobs, which Felipe will follow up on.
Horn-off data (Enrique/Chris)¶
Enrique was not able to attend today, but sent an email update. He will attempt to run the crashing jobs at SMU to see if he can process them there. Satish commented that it’s important to understand the source of the problems and proceed based on what exactly is going wrong. Chris commented that this is likely the code choking on some pathological events, and that using the event display on those events is likely to be very helpful. Satish has forwarded this suggestion to Enrique.
ND New Position MC (Enrique/Chris)¶
There are 19990 (out of a requested 20k) files in sam, leaving ten more to process. Two cannot be processed, so there are eight to track down. Satish has requested an update from Enrique on this subject.
ND Mini-prod calibration (Paola)¶
Three files were somehow affected by copy out errors, and hence have not showed up in the output datasets. Paola has identified the problematic files and resubmitted jobs. They should be complete very soon.
ND Mini-prod CRY+Calib (Bruno)¶
Bruno was unable to attend the meeting, but sent an email update: he has managed to get a few more files through, and will investigate the remaining crashes tomorrow.
FD Mini-prod CRY+Calib (Paul)¶
Paul has successfully processed 490 of the 500 g4 files in gain 150 mode. He has also processed ten files in gain mode, but they were submitted with the incorrect nova.special parameter, so they appear as gain 150 files. They need to be retired. Paul was not certain on what the correct FCL file to use is. Satish will inform him.
FD Genie MC (Tapasi)¶
Tapasi has been having issues with jobs still crashing. From the log files, it appears that the nova exe was never run by the job. There was some speculation that this might be because of incorrect disk requirements specified at submission time. This could have arisen because of an old version of the script, but Tapasi claims to have submitted with the most recent version. Paul has seen this issue before, and will help Tapasi debug it.
Amazon Running/NC Respins (Paul R)¶
Paul has gotten access to the amazon VMs where jobs run, and has managed to reproduce the errors observed in grid running. The issue appears to be that the version of ifdhc being set up is unsetting the UPS python version. This is something we have seen before, and Paul is investigating how to fix the issue.
Stashcache Testing (Paul R)¶
Paul has sent test jobs, which failed because of the wrong version of ifdhc. This might be because of a hardcoded version of ifdhc in the stashcache script sourced by art_sam_wrap.sh. This should be improved.
- Siva asked about generation of flat files for the ND physics group, which a couple of analyses need. Satish suggested that the ND physics conveners contact Satish and Alex with a request, instructions, and an indication of the priority of these samples.
- Joe asked about starting up ND mini prod genie simulation. This hasn’t started yet, but we should get it going before the new year.
- Chris asked about making decaf merging a standard production task. He has scripts that do this, so it should be straightforward. A new data tier might be required. Satish observed that a pre-requisite for this is reliable production of all contributing inputs, an issue where we have continued to struggle.
- Susan wanted an update on the spill cut bug. Chris has regenerated a fix, and has been waiting to finalize some details before announcing the updated files.
- Paola has identified a few hundred failures in reco keepup. They appear to be site specific. She will follow up with Ernique, Alex and Amit if that bears out, and will also resubmit to other sites.