Attending: Satish, Susan, Joe, Gavin, Enrique, Alex, Paul S, Justin, Kanika, Sijith, and Prabjot, Tapasi, Paul R, Qiulan, Paola, Felipe
Alex commented that this Thursday morning there will be a significant downtime involving redmine and the repositories, dCache, SAM and the ECL.
Revised Processing Plan (Alex and Satish)¶
Details are in Doc-db 14386
Paul commented that the rock neutrino sample is not quite ready, but should be done soon, and with FA14-10-03x.a, not FA14-10-03x.c as mentioned in the slides. Kanika raised the question of whether this sample has the correct Birks suppression. Satish will follow up offline to make sure the issue is understood.
SW Tags (Paul S)¶
Last week Paul cut tag FA14-10-03x.e. This will be used for the horn off MC generation. The nightly builds are still running smoothly.
We will want a new R-release this week for mini-production. There are still some issues we are waiting to get resolved before we can cut the next tag. First, we find that some files are still crashing with RemoveBeamSpills. This must be fixed. Secondly we need to confirm that all data-quality (Bad Channels, Diblock masks) updates are in place. Support for 2p2h is still delayed (see Docdb-14388), and will likely not make it into the first patch-release.
The upgrade to art 1.17.03 is delayed because of some missing dependencies needed to support the ANG packages. It’s not clear why this should be a problem, since we don’t build ANG by default.
Offsite Status Report (Enrique)¶
Details are in Docdb 14304
Most sites are not working right now. The MWT2 and Wisconsin sites are failing because they have been upgraded to a version 3 series kernel. UPS uses the kernel version to identify the distribution version, and version 3 is normally associated with SLF7, which we do not build against. Hence some ups setup commands are failing. There is a UPS_OVERRIDE environment variable to control work around these sorts of issues, but there is a case mistake in how it is set in art_sam_wrap.sh. Enrique will send Satish details on what needs to be fixed.
The other jobs seem to be failing to get SAM files. Paola notes that this is a global problem and seems to be associated with the SAM backend. She will file an SNOW incident to get this resolved.
Last week's tests at OSC are starting to make some progress. He did manage to get two jobs through, but most tests didn’t even start. He is working with the admins there to get this sorted out.
Enrique is investigating the possibility of giving his test jobs higher priority. SCD initially pointed him at priority groups, but that it’s not clear when that feature will become usable. There should be ways for Enrique to set priorities within his own jobs, and this us what he should pursue.
File Size Update (Justin)¶
Justin has been looking for instances of code that may be broken if we drop the DAQ data product. The list includes some unlikely candidates (such as CAFMaker), so this will need to be revisited. There was also some confusion about which exact objects should be dropped from the event: the daq data product or the flatdaq product.
The point was made that when he is ready, Justin should attempt to run some test jobs, for example of the event display. As we get more confidence that this will work. we should move into dropping the objects for reco keepup.
New ND Pos Std mix/caf (Enrique)¶
The jobs are launched but stuck. This is likely because of the issue that Paola identified with the SAM backend.
FD data respin (Paul R)¶
On Thursday Paul got the go-ahead to begin respin of files. On Friday morning he looked at the files uploaded to S3, but they were wrong data tier. The correct files have been identified, but he has not yet checked to see if they are there. He is ready to go once the files are there. He will run tests, first 1file, then 10, then 100, and then hand the jobs off to Paola.
Raw2root keepup (Qiulan)¶
Going smoothly. Two nodes had autofs issues, acting like black holes. problem resolved.
Calib Keepup (Felipe)¶
Last Friday this was paused for normal and high gain. On Friday morning the ND processing resumed. Some of the normal gain files failed with the RemoveBeamSpills error. This is with the new tag, so it’s not clear how this is happening. Satish will follow up with the experts.
Reco Keepup (Paola)¶
Processing of the BNB files is turned back on.
Processing of the ND Numi files should be paused until the DQ issues are resolved. Similarly, reco of the high gain FD files is also on hold for the moment.
Paola is working on a script to quickly find out details for error files in FTS. It is nearly ready, and she will send out more details this afternoon.