Attending: Satish, Alex, Kanika, Pengfei, Susan, Paola, Joe, Enrique, Paul S, Chris, Tapasi

Computing issues (All)


We seem to have identified a mode in which we can run well: 1k ND full chain jobs and 600 similar FD jobs. Chris has started up a second server which should double our capacity, but it has been auto blocked, so we haven’t been able to test it extensively. Alex has filed a ticket to remove the auto block, but so far has seen no feedback on it. He will ping SCD again shortly.

Status of SMU (Enrique)

Enrique has had a discussion with Amit, and they now believe that things are functioning again. Enrique has some last tests running now. Provided they are successful, Enrique is prepared to declare SMU usable again.

Tags (Paul)

  • Paul cut a new release for cross section measurements last week, Xuebing will verify that it works before we declare that one closed.
  • There is a snapshot due this week.
  • Paul is working to install upgraded versions of nutools and caffe. The nutools version is to pick up the new giant needed for improved modeling of neutron capture. There was some discussion of the release and testing strategy. We seemed to converge on a release strategy: a new release branch based off of prod2genie. However consultation with the sim group is still needed, especially as testing the new geant is really in their purview.
  • We need a new prod2reco tag to pick up changes needed for MRE. There was some discussion (later in the meeting) that we may need to wait for updates to calibration FCLs as well. However investigations by Chris after the meeting revealed that this is not the case.

Dataset Definitions (Joe)

Joe sent around FD genie definitions last week. Satish has posted the new definitions to the web, but because of some typos he made, the pages have not been updating. He belives the problem is resolved, and we should know soon. Update: The problem is not resolved. Satish is investigating.

Joe is now working on ND numi definitions and should have them ready later today. He will get FD data definitions ready, but not actually make them until we start processing files in case the release changes.

Joe sent Tapasi a pointer to the script so she can make the FD Birks definitions. She should make the modifications, double check that things look right with Joe, and run tests. Provided the tests make sense she should make the datasets and send them around.

Satish working to get the webpage in shape. Joe should be available to help with that soon.


ND data reco (Bruno):

This is nearly done although there are some missing files. We just need to make sure that the missing files are what we expect from known crashes we are choosing not to address for second-analysis.

FD Birks Reco (Tapasi)

The nonswap files should be finished. The fluxswap BirksB are nearly finished, but the flux swap BirksC has not yet started. Tapasi should work on getting those produced today.

ND genie RW reco (Paul/Alex):

This is proceeding well. Over the weekend, Alex has been submitting additional jobs to this project to test out the new maxConcurrent feature. We expect to have this sample done mid-day tomorrow, although if we can get the second lemserver running well we should be able to get it done tonight.

There are some issues with large memory consumption: about 3% of files fail with this error. It was suggested that we run any remaining jobs with larger memory request to deal with this. Draining out the last few files may take some time.

AWS (Paul)

Jobs were resubmitted and seem to be behaving well so far. Files are not getting declared, so it’s difficult to estimate overall progress. Paul is working on the in-job declaration but is having difficulty understanding the issue. The best lead at the moment is an apparent difference in behavior between the metadata extractors for art files and CAF files.

Raw2root keepup (Paola)

Everything is running smoothly except for a temporary FTS misconfiguration, now resolved.

Reco keepup (Qiulan)

This is running smoothly. About ten files are crashing, with what looks like a DB issue. Qiulan will send an email with details.

Schedule overview (Alex)

The goal is to finish ND genieRW samples by today. For systematic samples, we anticipate being able to process one sample (ND+FD)/day. The bottleneck is LEMServer, so MRE electron insertion can proceed in parallel with that using as many cores as Kanika can get. Tuesday through Thursday we will run through the systematics samples, Birks first. Friday through Sunday Kanika will run MRE reco. The following Monday-Wednesday we will push through the remaining systematics.