01-February-2016 » History » Version 1

Satish Desai, 02/01/2016 03:59 PM

1 1 Satish Desai
h1. 01-February-2016
2 1 Satish Desai
3 1 Satish Desai
4 1 Satish Desai
Satish, Susan, Alex, Enrique, Felipe, Joe, Justin, Qiulan, Vito, Paul,
5 1 Satish Desai
6 1 Satish Desai
h2. News
7 1 Satish Desai
8 1 Satish Desai
Satish observed several FTS restarts this morning, but no traffic indicating why such restarts might be needed.  Vito commented that this might be a result of a ticket he filed (production was added as a watcher).  He will forward information to Satish.  Since he filed the ticket and the restarts occurred, FTS has become responsive again.
9 1 Satish Desai
10 1 Satish Desai
The calibration group has signed off on the preview calibration constants.  However there were permissions issues in creating a new calibration UPS product observed on Sunday.  Based on commit logs the issue should have been resolved, but there has been no email traffic confirming this.   Making use of the new constants will also require code changes.  Jon Paley is working to complete them.
11 1 Satish Desai
12 1 Satish Desai
Alex and Satish just returned from a meeting with SCD management to request more on-site resources at Fermigrid.  SCD has giving grudging approval for a significant increase in computing resources, but this will come at the expense of other experiments.  Hence it is incumbent upon us to make sure we can use the most of whatever offsite resources we can get, and to do our work as efficiently as possible.
13 1 Satish Desai
14 1 Satish Desai
h2. SW Tags (Paul S/Alex)
15 1 Satish Desai
16 1 Satish Desai
Paul could not attend, but sent an update by email, which Alex reviewed, and supplied some additional details
17 1 Satish Desai
18 1 Satish Desai
* Nightly build - no problems
19 1 Satish Desai
* We appear to have successfully upgraded to the new version of nutools including a newer version of art
20 1 Satish Desai
* We released the first tag using these, S16-01-26.  There were some issues building it due to some makefiles needing to be tweaked in incorporate latest versions of software (fixed by Gavin).  We have also built another tag S16-01-28.  Alex has been running tests with these snapshots, and they appear to be behaving themselves.
21 1 Satish Desai
* There were some issues with building tags from branches last week.  These turned out to be an issue with a Jenkins script not recognising the name of the tag as it has too many characters/spaces in name, etc.  This was fixed by Jonathan.
22 1 Satish Desai
* Some cleaning up of the repository is needed (plan for this today).
23 1 Satish Desai
* Robert Hatcher thinks he has fixed the problems in genie that caused the anomalous vertex distributions.   He is waiting for sign-off from the genie group, and is working with Lynn to get a new build of nutools ready.
24 1 Satish Desai
* All of the prod2 and preview processing requests this week should be done w/ the prod2calib series tags (except for ND raw2root backprocessing).  Alex has merged over the needed readout sim and photon transport changes.  He is stalling on requesting a new release until we have the required calibration and  geometry updates.  Depending on what is available when we should have a 1-3 new tags based off of prod2calib this week, as well as one new snapshot.
25 1 Satish Desai
26 1 Satish Desai
h2. Nightly Tests (Satish)
27 1 Satish Desai
28 1 Satish Desai
Nightly production tests are still failing, for reasons that Satish has been unable to understand yet.  The issue appears to be a crash in the make_sim_fcl stage, even though the script works fine interactively.  He is asking help of Paul and the production test experts (Matthew and Bruno).
29 1 Satish Desai
30 1 Satish Desai
h2. Simulation signoff status (Adam)
31 1 Satish Desai
32 1 Satish Desai
* Jim and Xinchun are still in the process of validating the geometry.   They should confirm that it is usable within a day or so.
33 1 Satish Desai
* Robert has identified and fixed the problem with genie that was causing missing vertices in the very upstream portion of the ND.  He needs a sign-off from the genie authors before releasing a new version of genie, and is working with Lynn to get a nutools build that uses it.
34 1 Satish Desai
* We are good to go for ReadoutSim and PhotonTransport.  Alex has merged the relevant commits into prod2calib, which is ready for generation of FD CRY.
35 1 Satish Desai
* Dan has confirmed that the Birks-Chou correction does not need to be updated.
36 1 Satish Desai
37 1 Satish Desai
h2. Processing Assignments
38 1 Satish Desai
39 1 Satish Desai
Here are the assignments for this week.   Details will come by email later.
40 1 Satish Desai
41 1 Satish Desai
* ND Raw2root Backprocessing: OPOS
42 1 Satish Desai
* FD CRY/PCHIts (gain 100 and gain 140):  Enrique
43 1 Satish Desai
* ND CRY/PCHits: Paul
44 1 Satish Desai
* FD Cosmics PCHits: OPOS (in progress already) 
45 1 Satish Desai
* ND Cosmics PCHits: OPOS
46 1 Satish Desai
* ND NuMI Data Preview:  Bruno
47 1 Satish Desai
* FD Cosmic Data Preview Joe (needs to wait a bit on inputs from Kanika and Kirk)
48 1 Satish Desai
49 1 Satish Desai
Job runners should take care to update the ECL and to be aggressive in seeking help when they encounter difficulties that they don’t know how to solve.
50 1 Satish Desai
51 1 Satish Desai
h2. Processing Updates
52 1 Satish Desai
53 1 Satish Desai
h3. FD Calibration (Vito)
54 1 Satish Desai
55 1 Satish Desai
Vito has submitted jobs for 3 of the 4 requested datasets.  Jobs have generally been proceeding smoothly, but some FTS backlogs have developed, which is slowing progress.  He has followed up on the FTS issues with the servicedesk.
56 1 Satish Desai
57 1 Satish Desai
Initial submissions were hampered because the prod2calib release was not appearing on CVMFS offsite.  This problem has not been resolved.  Vito should coordinate with Enrique and SCD to ensure this problem is resolved.
58 1 Satish Desai
59 1 Satish Desai
h3. Reco Keepup (Qiulan)
60 1 Satish Desai
61 1 Satish Desai
Some ND jobs are crashing in RunHistory.   These are associated with runs  11392 and 11395.  Jon Paley has taken a look, and it appears that RunHistory is unable touches any information about the run, even though it exists in the database.  He is working on understanding the problem.
62 1 Satish Desai
63 1 Satish Desai
FD are generally running smoothly, but some jobs are failing with memory issues.  We need to change this to use the most recent release.
64 1 Satish Desai
65 1 Satish Desai
h3. Raw2root keepup (Vito)
66 1 Satish Desai
67 1 Satish Desai
This is running smoothly.  A handful of jobs have disconnected, and will be restarted.   One job over the weekend failed with exit code 2.  This appears to have been a transient problem with the job not getting files because there were no files left to process.  There have also been FTS associated errors as well.   We should have moved over to writing separate json files with metadata, so that cannot explain the FTS slowdowns.
68 1 Satish Desai
69 1 Satish Desai
h3. Rock Generation (Felipe)
70 1 Satish Desai
71 1 Satish Desai
This week Felipe submitted 4484 jobs, of which just 59 jobs failed.  These are all at caltech.  Felipe opened GOC ticket and the worker nodes were restarted, after which the problem, appears to have been resolved.  The sample has been completely processed, so we can close the ticket.
72 1 Satish Desai
73 1 Satish Desai
h2.  AOB
74 1 Satish Desai
75 1 Satish Desai
The SU-OG site has been brought back up, so we should be able to submit there.
76 1 Satish Desai
77 1 Satish Desai
There are FTS errors associated with duplicate .log and .log.bz2 files.   We need to ping Chris to understand why these duplicate files are being produced.