Attending: Satish, Susan, Chris, Enrique, Alex, Art, Paul S, Paul R, Paola, Sijith, Tapasi, Art
News (Alex and Satish)¶
gpsn01 Shutdown (All)¶
SCD is preparing to shut down gpsn01, which should have been idle since we stopped using the old jobsub. Several users are running jobs to renew nova production proxies. There are also several production jobs running gridftp copies. Paola recognizes some of the jobs. One job renews the novapro proxy, and copies it to other nodes. Paola will move cron jobs to novagpvm10, and consult with Art to make sure this is done. There are several other jobs running kproxy for individual users. Satish has notified those users. Chris asked whether kproxy might still be needed, but Art claims this should not be the case. There is also a script, allegedly belonging to Ryan that copies a handful of keepup files over to BlueArc. Satish will notify him.
Network requirements for nova jobs¶
Petr Vokac at FZU has noted that a number of nova jobs at FZU that required large inputs from SAM, which prevented those jobs from running efficiently. No-one spoke up to claim the jobs. Petr recommended that we use stashcache to address some of these problems. We have now added support to our MC generation scripts for stashcache, but that support has not yet been tested. This should be done soon.
New Dataset Naming Schemes (Satish)¶
Chris does not like the proposal of using timestamps in the dataset definitions, nor the fact that they are still in our filenames. There was a discussion of possible alternatives, but no clear consensus on what that should look like. We agreed to take the subject offline, and include Ryan P in our discussions, as the expert on make_sim_fcl.
There was also some discomfort of the use of the descriptor “full” in the datasets presented to users, as this is a term that might not remain descriptive. Several people expressed an opinion to drop the TIMESPEC and TIMESTAMP field in the merged datasets.
The possibility was also raised to create named snapshots to refer to fixed sets of files for later reproducibility. This should especially be done, e.g. for runs that go into an analysis to be published. The new position MC samples are an excellent first candidate for this.
SW Tags (Paul S)¶
Paul has spent several days trying to upgrade the version of art we build against. He got everything committed by Wednesday, but support for the novadaq and novaddt packages for SLF5 was missing. So we currently have a build for SLF6, but not SLF5. We should be able to just ask for the daq and ddt. builds. Paul has asked Serdar for this but it appears he is away on vacation. So Paul will contact Martin Frank. Also there were issues with the SLF6 build because of hard-coded package versions of the builds that needed to be updated manually. We will get together a list of packages that need to be updated and contact the authors.
We will cut a new snapshot tag as soon as we have confirmation that we can build in both SLF5 and 6, and that the build passes all production tests. There was some discussion of how this cycle could be sped up, as development builds and production tests are both only done once a day.
There was also some discussion about the need to stop support for SLF5. Alex will send around an email asking who is using it. At the moment we know that this is used at Caltech and at the Harvard cluster. However, Harvard is in the process of upgrading to SLF6.
So far there has been no progress in attempting to build against art 1.18, which uses root 6. We should ask for SLF5 builds of novadaq and novaddt against this version of art at the same time as we request builds against art 1.17.
Offsite Status Report (Enrique)¶
Skipped for today, but Enrique has uploaded slides:http://nova-docdb.fnal.gov:8080/cgi-bin/ShowDocument?docid=14304
Raw2root Keepup (Qiulan)¶
This has been running smoothly but 2 corrupt files have been observed.
Horn-off data (Enrique/Chris)¶
This has not started yet.
Horn-off MC (Enrique/Chris)¶
It seems that out tiers were mixed up Running some tests to confirm that switching order back fixes the problem. This was also affecting reco of the ND new position MC
ND New Position MC (Enrique/Chris)¶
Similar situation to horn-off MC. After tests have been demonstrated to work, Enrique will rerun the reco step, using nova subversion 2, rather than having Satish retire files. This will require an update to the mix/caf FCL files. Chris will let Enrique know what needs to be changed.
FD New Position MC (Joe)¶
Joe has been on vacation, so has made no progress, but this is his top priority now.
ND Mini-prod calibration (Qiulan)¶
Period 1 is done. Period 2 some files crashed. Qiulan is investigating on nova lists
FD Mini-prod calibration (Qiulan)¶
Epoch 3b is in progress. Eighteen jobs failed and 9,714 files are available in SAM. Qiulan will be resubmitting soon.
There were also 288 pre-shutdown jobs failed for transient issues. Qiulan will resubmit.
ND Mini-prod CRY+Calib (Bruno)¶
Not attending, will follow up
Preparation for FD dual-gain simulation (Gavin)¶
This is ready to go, and we should make new patch release tag to support it.
Amazon Running/NC Respins (Paul R)¶
Paul has made a lot of progress. He has managed to submit jobs that run at amazon. But the jobs fail because they can’t see files on S3, which is where reco files are. The issue is that the jobs need the latest version of ifdh installed on CMVFS. He is pursuing getting that done.