Attending: Satish, Alex, Susan, Felipe, Paul S, Qiulan, Gavin, Enrique, Vito, Paul R
SW Tags (Paul)¶
Gavin cut a new miniproduction release, R15-11-17miniprod-mec.c over the weekend, and Paul did the build and distributed it.
We also want to cut a new snapshot to update the version of nutools, art and other externals. This requires an update of the novaddt and novadaq products, and their associated dependencies. Serdar is doing the builds, and is waiting on novaddt and novaddt-deps. Once this is done, Paul will be able to make the tag.
No issues have been observed with the nightly builds.
Rationalizing Generation (Paul R/Gavin)¶
Paul has created a new script to generate configuration files for MC jobs that can be used for submission by submit_nova_art.py. Essentially this is a heavily gutted version of submit_mc_gen. It is mostly working now. He has mostly finished the overlay stuff for jobsub, although it hasn’t been tested yet. He has’t done work yet on updating the definition or file naming scheme. The latter will require changes to runNovaSAM.py as well as make_sim_fcl. Satish has some of this written, but it’s turning into a large project, and will discuss with Paul and Gavin at Dallas.
Gavin will handle making the Production package a separate UPS product.
Flatdaq Dropping (Justin)¶
Running on FD comics, Justin has achieve disk space savings about what was expected: files are reduced from 2.8 GB to 2 GB. Alex has suggested to run CAF maker and use diff_caf to make sure that results after removing the fladaq objects do not change. Justin will do that today. After that, the residual question is where in the process we want to add the flatdaq dropping. Both Alex and Satish are of the opinion that this should be as early as possible while still not part of the raw2root chain.
where in process do we want to run this: make first step in reco
Stash Cache Updates (Joe)¶
Raw2root Keepup (Vito)¶
This is running smoothly. There were some issues last week with SAM not working properly as a result of the Thursday downtime. Two projects were not getting files. Over the last two days, we have been getting more files, as we work through the backlog from the downtime.
Reco Keepup (Qiulan)¶
Things are running better now that Omaha is taken out of the list of sites we submit to. Because of the large memory requirements job spend a long time idling as they wait for available sites. Alex was pleased that even with the large requirements there are some offsite farms that we can run at. This is an indication that sone locations (e.g. MWT2) are offering job slots with larger memory resources.
Last week Qiulan contacted Ken to get information about different OSG sites.
Rock Neutrinos (Felipe)¶
About 1300 jobs have been submitted, with one file each, whoever Felipe has seen only a 21% success rate. Successful jobs have run only at Fermigrid. Other jobs have failed with an error indicating that they cannot find the CVMFS release area. Felipe is running some additional tests with a modified version of art_sam_wrap.sh with additional debugging output. We are waiting to see the results of those tests.
Amazon Status (Paola)¶
Paola has submitted a couple more tests, attempting to run 100 jobs concurrently in less than one h. We haven’t been able to do it yet. The reason is apparently that the Infrastructure not ready (an updated version of the glide-in WMS package is needed). That should happen today, with tests submitted tomorrow. In the mean time, Paola will submit additional tests to debug other steps in the process.
Enrique commented that some offsite farms will only run jobs if you submit with expected lifetimes. These include Michigan, SMU and Wisconsin. Alex commented that this is touching on a much larger issue, and is related to how things will be changing at Fermigrid as well. He will be giving an extended talk on this at the collaboration meeting. Some of these changes can be made transparent to users by changes to submti_nova_art.py.