01-February-2016 » History » Version 1
Satish Desai, 02/01/2016 03:59 PM
Satish, Susan, Alex, Enrique, Felipe, Joe, Justin, Qiulan, Vito, Paul,
Satish observed several FTS restarts this morning, but no traffic indicating why such restarts might be needed. Vito commented that this might be a result of a ticket he filed (production was added as a watcher). He will forward information to Satish. Since he filed the ticket and the restarts occurred, FTS has become responsive again.
The calibration group has signed off on the preview calibration constants. However there were permissions issues in creating a new calibration UPS product observed on Sunday. Based on commit logs the issue should have been resolved, but there has been no email traffic confirming this. Making use of the new constants will also require code changes. Jon Paley is working to complete them.
Alex and Satish just returned from a meeting with SCD management to request more on-site resources at Fermigrid. SCD has giving grudging approval for a significant increase in computing resources, but this will come at the expense of other experiments. Hence it is incumbent upon us to make sure we can use the most of whatever offsite resources we can get, and to do our work as efficiently as possible.
h2. SW Tags (Paul S/Alex)
Paul could not attend, but sent an update by email, which Alex reviewed, and supplied some additional details
* Nightly build - no problems
* We appear to have successfully upgraded to the new version of nutools including a newer version of art
* We released the first tag using these, S16-01-26. There were some issues building it due to some makefiles needing to be tweaked in incorporate latest versions of software (fixed by Gavin). We have also built another tag S16-01-28. Alex has been running tests with these snapshots, and they appear to be behaving themselves.
* There were some issues with building tags from branches last week. These turned out to be an issue with a Jenkins script not recognising the name of the tag as it has too many characters/spaces in name, etc. This was fixed by Jonathan.
* Some cleaning up of the repository is needed (plan for this today).
* Robert Hatcher thinks he has fixed the problems in genie that caused the anomalous vertex distributions. He is waiting for sign-off from the genie group, and is working with Lynn to get a new build of nutools ready.
* All of the prod2 and preview processing requests this week should be done w/ the prod2calib series tags (except for ND raw2root backprocessing). Alex has merged over the needed readout sim and photon transport changes. He is stalling on requesting a new release until we have the required calibration and geometry updates. Depending on what is available when we should have a 1-3 new tags based off of prod2calib this week, as well as one new snapshot.
h2. Nightly Tests (Satish)
Nightly production tests are still failing, for reasons that Satish has been unable to understand yet. The issue appears to be a crash in the make_sim_fcl stage, even though the script works fine interactively. He is asking help of Paul and the production test experts (Matthew and Bruno).
h2. Simulation signoff status (Adam)
* Jim and Xinchun are still in the process of validating the geometry. They should confirm that it is usable within a day or so.
* Robert has identified and fixed the problem with genie that was causing missing vertices in the very upstream portion of the ND. He needs a sign-off from the genie authors before releasing a new version of genie, and is working with Lynn to get a nutools build that uses it.
* We are good to go for ReadoutSim and PhotonTransport. Alex has merged the relevant commits into prod2calib, which is ready for generation of FD CRY.
* Dan has confirmed that the Birks-Chou correction does not need to be updated.
h2. Processing Assignments
Here are the assignments for this week. Details will come by email later.
* ND Raw2root Backprocessing: OPOS
* FD CRY/PCHIts (gain 100 and gain 140): Enrique
* ND CRY/PCHits: Paul
* FD Cosmics PCHits: OPOS (in progress already)
* ND Cosmics PCHits: OPOS
* ND NuMI Data Preview: Bruno
* FD Cosmic Data Preview Joe (needs to wait a bit on inputs from Kanika and Kirk)
Job runners should take care to update the ECL and to be aggressive in seeking help when they encounter difficulties that they don’t know how to solve.
h2. Processing Updates
h3. FD Calibration (Vito)
Vito has submitted jobs for 3 of the 4 requested datasets. Jobs have generally been proceeding smoothly, but some FTS backlogs have developed, which is slowing progress. He has followed up on the FTS issues with the servicedesk.
Initial submissions were hampered because the prod2calib release was not appearing on CVMFS offsite. This problem has not been resolved. Vito should coordinate with Enrique and SCD to ensure this problem is resolved.
h3. Reco Keepup (Qiulan)
Some ND jobs are crashing in RunHistory. These are associated with runs 11392 and 11395. Jon Paley has taken a look, and it appears that RunHistory is unable touches any information about the run, even though it exists in the database. He is working on understanding the problem.
FD are generally running smoothly, but some jobs are failing with memory issues. We need to change this to use the most recent release.
h3. Raw2root keepup (Vito)
This is running smoothly. A handful of jobs have disconnected, and will be restarted. One job over the weekend failed with exit code 2. This appears to have been a transient problem with the job not getting files because there were no files left to process. There have also been FTS associated errors as well. We should have moved over to writing separate json files with metadata, so that cannot explain the FTS slowdowns.
h3. Rock Generation (Felipe)
This week Felipe submitted 4484 jobs, of which just 59 jobs failed. These are all at caltech. Felipe opened GOC ticket and the worker nodes were restarted, after which the problem, appears to have been resolved. The sample has been completely processed, so we can close the ticket.
The SU-OG site has been brought back up, so we should be able to submit there.
There are FTS errors associated with duplicate .log and .log.bz2 files. We need to ping Chris to understand why these duplicate files are being produced.