Paola, Andres-Felipe, Joseph, Enrique, Biao, Paul, Gavin, Alex, Tapasi
- Paul Sail will be taking over from David Smith as our release manager. Thanks to David for all of his hard work, and thanks to Paul for taking on this important role.
- Alex is starting up Novasoft meetings, the first of which will be next Wednesday Sept 9 at 4pm.
- Gavin has created a nova production redmine site, and Alex and Satish have begun to populate it. It will probably take some effort getting used to, but hopefully it will be a useful tool. As a reminder, activity related to a given task (e.g. sample generation and processing) should all be logged on the appropriate redmine issue.
- Next Monday is a holiday in the US. So we will not have a production meeting on Sep 7. We will instead have a meeting on Sep 10 at 11:30am.
Computing Issues (all)¶
Satish was attempting to run some test jobs last week and over the weekend. At first he had some trouble getting jobs to run (because of the CVMFS checks that SCD had implemented), but now jobs just seem to disappear. Although it is difficult to see how, this maybe related to a new test version of submit_nova_art.py. Keepup jobs have not seen this issue, although Paola has some test jobs with the new script that she will report on.
Last week the CVMFS severs needed to be updated because of missing external packages. Gavin has taken care of this issue.
Paul cannot attend, as there is a PhD defense that conflicts with this meeting. He sent a report by email:
Additional ND genie sample: about 8k files have finished, with 12k more to go.
The rock neutrino sample: Paul submitted them on Friday, and they all died with the same error. This was because he generated them with the geant only option in make_sim_fcl. He generated new fcls yesterday with the correct settings and will submit them once they appear on FTS. Gavin suggests that this might be a different issue. Satish will follow up through email.
Gavin pointed out that we need to set up a new small file aggregation family in enstore because of the large number of very small FCL files we generate. SCD has just started to complain about this, but it’s not clear why this wasn’t an issue before. Alex has created a redmine issue for this. We should address it sooner rather than later, given the large number of FCL files we will soon be producing.
- Enrique will start reco’ing the additional ND MC. However as part of this, he will now take over creation of MC datasets. He is reading up on that now.
- Joseph: There was one straggling file for ND top-up data. It had a huge runtime, because it is very big. Lots of subruns in this run are long. Probably the daq and watchdog group should be informed. Joseph was surprised not noticed before. Update Domincik has commented that these files are very big most likely because the TDUs at this point in time were in a bad state.
Raw2root Keepup (Paola)¶
Processing has been going smoothly except for a couple files that are corrupt. One has dq.isgoodrun = false. The other did not have this flag set at all. Paola raised the issue of how we mark these files so that we do not process them. Could we set the dq.isgoodrun flag or the content status flag? No consensus was achieved during the meeting so we will further discuss this topic on the production list.
Paola will be in Colombia starting next week, and expects to be away for two weeks. She will not be checking email while away. Andres-Felipe will run raw2root keepup during this time, and Qiulan will run calibration keepup.
Calibration Keepup (Paola)¶
Calibration jobs have been halted since Aug 17 because art_sam_wrap was checking old, no longer valid paths for CVMFS issues. A new version of NovaGridUtils was released to fix the issue, but did not help. Satish has provided a new test script, but the jobs are idle so far.
Offsite Status Report (Enrique)¶
Enrique has gotten a report from Amit for the SMU status. They are waiting for FNAL to set up FTS (Rob I) and rooted at SMU before it is ready for running nova jobs. That should be all that is required to get the jobs running. SMU will use a dedicated storage element (SE), to speed up data throughput, both for on the input and output sides. SMU has 625 single core job slots at Nova. The slots have no disk restrictions, but memory should be limited to 2.5 Gig/core. Getting more memory will require requesting more CPUs/job.
Enrique will also start characterizing resources available to individual jobs at all offsite locations we can run at.
Retirement Script Modifications (Gavin)¶
Gavin will try to get these modifications tonight. These include confirming that the user running the script is novapro, and providing an option to retire the descendants of afile as well as the file itself. Gavin intends to make the latter feature an option, so you can just retire individual files as well.