Satish, Paul S, Enrique, Gavin, Vito, Joseph, Biao, Felipe, Qiulan, Paul R
Paul R and Satish met last week with Gabriel and several others from SCD to discuss a plan of action for getting nova jobs running on the grid. For this to happen, they need details from us on our workflow and what sort of inputs and outputs we need and produce. They also need an estimate of our production needs over the next year, and especially the next few months. The grant allowing us to use the amazon services was written under the assumption of 16 four-day campaigns over the course of a year. This doesn’t really comport with the pattern of out first analysis campaign, but may fit well into a rapid-turnaround style model for development of an improved simulation.
SW/Tags (Paul S)¶
Nothing of note happened with the nightly build
Paul has been running into some errors with creating new tags. He believes it is a permissions issue, and is talking to Gavin and Jonathan to get the issue sorted out.
Offsite Status Report (Enrique)¶
Enrique has been testing offsite resources using the production reco/pidpart/lem jobs he needs to be running at the moment. He has gotten jobs through at SU-OG, SMU, Omaha, Michigan, FZU and Caltech. Most of the failures are due to jobs taking a very long time to run and bumping into the 24 hour time limit. Jobs are also frequently idling for extended periods of time. Early submissions used a memory requirement of 2.5 GB, although he has since lowered it to 2 GB. So memory requirements are unlikely to be the issue.
Generation (Paul R)¶
- Extra ND MC: This has been making good progress. About 4k files remain to be generated. There is some pressure now to finish these jobs all the way through the chain in fairly short order.
- Rock Neutrinos: These are almost finished, and the last five jobs are going now.
- Top-up ND MC: The FCLs have been generated, but without metadata. Gavin believes that this is due to a bug in art_sam_wrap.sh and is working with Paul to fix the issue.
Extra ND MC: With help from Satish, he has fixed problems with definitions. Enrique is caught up with generation (still in progress) for the production of reco and lemsum files. Generation of pidpart files had not begun yet, but he will be starting those today. He has observed that the lempart files are nearly caught up as well.
He had some issues with jobs running at SMU, but his jobs there did process 3k files of 9k files total in his last pass.
Chris is in transit, and so did not attend today’s meeting. From redmine, Chris has reported that the he has
now finished processing the top-up ND data. Enrique has observed that the LEMing of the extra ND Monte Carlo is following closely behind the reconstruction.
Bruno is starting to finish up the last ND Top-up data file.
At present there are no files for Tapasi to process, but there should be soon. In the meantime, she is running test jobs interactively. The jobs are taking longer than she expects, but based on information from Gavin the runtimes may be reasonable. They will follow up offline.
Raw2root keepup (Felipe):¶
This has been running without problems since Thursday. About 500 jobs have run without errors.
Raw2root backprocessing (Vito)¶
He is processing about 150k files. Many older files in the “others” stream have the metadata parameter online.validtriggertypeslow has hex values, which evidently is problematic for creation of json files. These are all old files, from before the summer 2014 shutdown. Satish has recommended that for the others stream, we restrict ourselves to runs since the end of that shutdown. Since then, no other problems of this flavor has been observed.
Calibration keepup (Qiulan):¶
For past three days, Thus-Sun, only two jobs failed (no diblock mask issue). Resubmitted. All jobs running, need to evaluate status once they finish.