Project

General

Profile

19-January-2016

Attending: Satish, Enrique, Paola, Alex, Paul R, Chris, Vito, Paul S, Tapasi,

News (Satish/Alex)

Docdb 14575

One release branch should be used for all MC generation. We may wish a new tag that allows us to run reco with the miniprod calibrations, once they are available. The full prod2 reco tag will require the prod2 calibration effort to be tagged as well.

We need to start using json files to speed up FTS throughput. This has not been done for the caf’s which is where we have often run into FTS bottlenecks before. We should also start log file archiving. These are not turned on by default with production switch now. Satish will cut a new version of NOvaGridUtils to do so.

Production Readiness

Calibration (Brian/Matthew)

Calibration is in good shape to get started for Prod2. The miniproduction tests look ok, both for attenuation and for absolute calibration. The results have changed in the ways that they were expected to. Diana has run into some issues running jobs, but there are no show stoppers. We do want to run reco in miniprod, and this should be possible but we need to wait for FarDetector to be done before commencing. That should be ready tomorrow.

Some of the code to actually determine the constants is not yet in the repository, but it should be ok if that goes in later. Satish asked if the bug in readout sim issues recently observed had any impact on the calibration. Brian has seen no indication that it is causing a problem. Chris commented that most likely the main impact is to reduce available statistics, but probably not anything else.

It was also noted that we will need to redo the readout sim step for mini-production, and need a strategy for doing so.

DQ Status (virtual)

Online report from Louise:

DQ is good to go.

Bad channels have all been validated up through FD run 21230 and no runs have been found that need to be explicitly need to be removed.
Good Runs and diblock Masks are ready. Ryan is waiting on a response from us upon where to put the good runs file.

Simulation (Adam/Jim)

Not present. Via email Jim reports that sim intends to validate the geometry before the collaboration meeting. So far there is no news for the readout sim issue.

SW Tags (Paul)

Paul has recently cut a R15-11-17-miniprodmec branch and two tags of of this branch (.a and .b): .a is not usable because of some issues with setup script organization. The first usable release is R15-11-17-miniprodmec.b. This release is intended for generation of MEC events in mini-production.

He is currently updating nutoolsto v1.20.1, which he should be able to sort out in next couple of days. After that he will cut the next snapshot on Thursday or Friday. This has latest v1.17 series version of art, which we will stick with for a long time. The next version of art for use to use is v1.18, which involves migrating to root v6, sometime after second analysis.

Rationalizing Generation (Paul R/Gavin)

Docdb 14580

Stash Cache Updates (Joe):

Joe has taken a look at this, but the project got put on the back burner. He has some questions, which he should post to the list and also ask Robert for help.

Analysis Skimming (Alex)

No updates, working on code, understands it better.

Memory Usage (Alex)

Waiting to see what outcome of full chain

Processing Status

Raw2root Keepup (Vito)

This is proceeding smoothly. There were a few issues with some jobs that disconnected. They appear to be network issues. Some jobs had issues with failures to access cvmfs. This is particularly strange as these jobs do not use CVMFS (They run at Fermilab). This occurred twice on Saturday and once today (for another type of job). Vito has ppened a ticket, but it is possibly a transient issue.

Vito commented that the number of input files was abnormally low in the past few days. Satish said that this is because the Far Detector has been seeing large deadtime recently that originates with a failed disk.

Reco Keepup (Qiulan)

This has not been running smoothly. Most FD jobs failed with exit code 255. It appears to be the result of overly large memory consumption. Qiulan has sent email to Ken to confirm this interpretation and to try and understand limits on job memory. We may need to submit with larger memory requests. Satish suggested that we move to a more recent tag that has Chris’ improvements to the job memory overhead. This requires a go-ahead from Bruno, which Satish will follow up on.

Rock Neutrinos (Felipe)

Felipe sent 251 jobs to be processed onsite. Nearly all of these finished without problems. The remaining jobs failed with a segmentation fault. Alex committed that this be logged in the production ECL. Felipe also tried to run 12 jobs offsite, but because of the large amount of emory requested they did not start on many sites (UCSD, Nebraska and Omaha are ok). After further investigation we don’t actually need so much memory, so we can reduce the request and enhance the number of sites we can run on. The issue has been communicated to Gavin via SNOW. Alex commented that SNOW is not the most efficient way to do this, and OPOS should use the email list and production ECL.

Once Gavin makes the change, we should try to keep offsite sites busy.

Amazon Status (Paul R/Paola)

Docdb 14581