February 2014 Production

Notes to help keep track of progress on the latest production phase.
First step is MC generation in tag S14-02-05

Backported fixes to flag nova.label: "beta", fixed metadata seg fault at the end of jobs.

MC Generation

FCL Datasets/generation

Gavin produced and declared to SAM fcl files for the CRY and GENIE generation.

These are the datasets containing the fcl files:

MC Generation Tests

Gavin submitted 20 jobs for each configuration using Eric's submission scripts. Initially submitted on-site (FermiGRID) and then off-site, once OASIS server update was synced. All jobs sat in queue for > 12 hours thus far.

2014-02-20 Results:
there are 2632 files total in /nova/prod/FTS_DropBoxes/General_dropbox (soon to be FTS-activated and declared to SAM):

Number Detector Generator Swap Horn Current
860 FarDet Cry N/A N/A
355 FarDet Genie NonSwap FHC
0 FarDet Genie NonSwap RHC
354 FarDet Genie Swap FHC
370 FarDet Genie Swap RHC
693 NearDet Genie N/A FHC
0 NearDet Genie N/A RHC

FD Cosmics Generation

Full steam ahead. Given all clear from Simulation group to produce FD Cosmics after validation from Junting in doc-db 10817
All 10240 jobs have been submitted, multiple failures first time around but using a draining dataset that is continuing to resubmit, 4181 files declared to SAM (3/4: 12:31 AM).
Ready for PCHitList generation...

Ongoing Tests

Any news on PCHits and/or BadChannel tests should go here.
Gavin having submission issues, verifying running same thing as Dominick.

Prelim tests of reconstruction show it needs the "fastCloning: false" option set.
Added slicemef to the list of reconstruction path and cosmicveto, plus removed cana.
Here are the first time results running on a FD cosmic file:
CPU/event (seconds)
TimeReport 0.000573 exposure
TimeReport 0.088539 calhit
TimeReport 0.147297 slicer
TimeReport 0.016660 cosmictrack
TimeReport 0.001721 veto
TimeReport 2.626907 kalmantrack
TimeReport 0.013069 kalmantrackmerge
TimeReport 0.823334 multihough
TimeReport 2.207830 elasticarmshs
TimeReport 0.281312 fuzzykvertex
TimeReport 0.929970 michelecosmictrack
TimeReport 0.298158 michelekalmantrack
TimeReport 113.375707 slicemef

Alarmingly slicemef is incredibly slow on FD cosmics. It runs comparable to the michelefilters for ND (so quickly).
Decision may be to pull it from FD processing.

We still haven't completely validated the fact that BadChanList is working as it should be. It appears to be running, but it would be good to see some example channel masks in an event display to see that they are sane. The current holdup is that Event Display can't find the RunHistory service, detailed in issue #5600.


Latest iteration

Reprocessed pchits in S14-03-24 due to new track quality cuts etc in the attenuation calculations.

SAM Datasets are available for the quota of pclist and pcliststop files:
(added nobad since there are two corrupt files in original dataset)

As of March 30th they contain 1749 and 1719 files respectively.

ND CRY Datasets also available:

First iteration

Datasets produced that contain the FDCry Cosmic artdaq files:
S14-02-05CryFD_artdaq (submitted 2000 files)
S14-02-05CryFD_artdaq_drain (dataset empties as the parent file has produced a child)

SAM Datasets are available for the quota of pclist and pcliststop files:
(added nobad since there are two corrupt files in original dataset)

each containing 10145 files (200events/spill) from an original dataset of 10168.
The files are also available on bluearc local disk:

% /nova/ana/calibration/mc/fd/S14-02-05/*pclist.root (*pcliststop.root)

(note the release here indicates the original simulated release due to a mistake in the config setup and too complicated to change now.

Updated configuration of the dropbox to be able to declare the pclist files to SAM.

Scan directory: /nova/prod/pchits/mc_dropbox
Scan interval: 5min
Scan patterns: *.root
Scan exclusion patterns: hist*.root
Transfer to: novadata:/nova/ana/calibration/mc/${NOvA.detectorID}/${Simulated.base_release}; enstore:/pnfs/nova/mc/pchits/${NOvA.detectorID}/${Simulated.base_release}/${file_id/100[9]}
Erase files after: 24h


Tag S14-03-06 is for reconstruction.


I reconstructed using Utilities/batch/recoproductionvalidationjob.fcl (or some minor variant) in tag S14-03-06. I used the small sample of genie files that do not have the new physics list mod or gsimple flux input, but the reconstruction should performance in the same fashion.

I used 10 files per configuration as input and joined them (hadd) into one file per config here:


FD cosmics: fardet_cosmics_S14-03-06.sim.daq.reco.hist.root
FD genie FHC nonswap: fardet_genie_fhc_nonswap_S14-03-06.sim.daq.reco.hist.root
FD genie FHC swap: fardet_genie_fhc_swap_S14-03-06.sim.daq.reco.hist.root
FD genie RHC swap: fardet_genie_rhc_swap_S14-03-06.sim.daq.reco.hist.root
ND genie FHC nonswap: neardet_genie_fhc_nonswap_S14-03-06.sim.daq.reco.hist.root

These are the samples we have currently. Should be enough for reconstruction validation. The original *.reco.root and *.hist.root files are in the directory above.
Ignore the fuzzykana tree. The fuzzykvalidate module was not reconfigured to read Prongs instead of Tracks. Evan will run his own validation. Also the ktfuzzyana modules was removed for the same reason.

Action Item: Expect feedback before or at the Reconstruction meeting on Monday 17th March (St. Patrick's Day).

Data processing

Ultimately the progress of getting files into SAM can be observed via Dominick's Watchdog plots page:

Note that the nova.label is set to "alpha" for data processed in S14-03-06 or before because it is picking up the parent metadata parameter due to a bug in fcl precedence - Issue 5626


Actually being run in S14-01-20 and now included the geometry gdml file in the data file.


Dropbox configuration

Updated FTS configuration to be able to declare pclist and pcstop files to SAM and ultimately give them a bluearc location too.

directory: /nova/prod/pchits/data_dropbox
Scan interval: 5min
Scan patterns: *.root
Scan exclusion patterns: hist*.root
Transfer to: novadata:/nova/ana/calibration/data/${NOvA.detectorID}/${Calibration.base_release}; enstore:/pnfs/nova/data/pchits/${NOvA.detectorID}/${Calibration.base_release}/${file_id/100[9]}
Erase files after: 24hr

Job submission

Jobs for 3980 files from runs 13150 -13350 were submitted on March 7. 3805 of those jobs ran successfully and produced output in the dropbox mentioned above. The failures in that set may have been caused by intermittent database connection problems. The failures seem to be somewhat back-loaded in terms of job run time.

Another 3990 jobs for all files processed from run 13350 onward were submitted on March 9. The vast majority of those jobs failed due to widespread database connection problems. Only 54 pairs of pchits/pchitsstop files arrived in the dropbox.

The cause of the database problems are unknown, but may have been related to a (late day March 7) backport to the S13-03-06 tag that changed the database address from prod to dev. On March 10, the tag was again patched to point to a web-cache port (8081) rather than the default port (8084).

The jobs for runs 13350 and onward (3990 total) were resubmitted on the morning of March 11. With a concurrency maximum of ~1100, the jobs have been proceeding without any database troubles. Since output from some of these files has already been recognized by FTS, they could not be put directly in the dropbox mentioned above. Instead, they are being placed in a temporary dropbox, namely: /nova/prod/pchits/data_not_dropbox/. When the jobs have run to completion, they will be moved to the true dropbox only if they do not already exist there.

As of March 17, FTS has caught up and all of the files processed are in SAM. A dataset has been created for cosmic trigger data, it is called prodcalib_S13-03-05_FD_data_cosmic_pclist. A snapshot was taken on March 17 with snapshot ID 14870; a frozen dataset corresponding to that snapshot is called prodcalib_S13-03-05_FD_data_cosmic_pclist_snap14-03-17.


Reconstruction was run in the S14-04-14 tag for MC and S14-04-14 for data. There is no good reason for the discrepancy, except that S14-04-14 has Calibrator.tag: v3 instead of v2, which provided averaged constants for data in uncalibrated channels.

To sidestep database issues, the calibration constants for data were taken from CSV files were used for this processing. Eventually it was discovered that the initial version of those CSVs did not include the averaged calibration constants for uncalibrated channels. This means that no cells beyond diblock four are calibrated. There are only two modules in the reconstruction chain that use calibrated hits, FuzzyKVertex and MichelEFilter, so those are the only data products that are affected. When this error was discovered, the incomplete CSVs were switched to ones that included averaged calibration constants. These were primarily used to continue processing of NuMI trigger data. NuMI trigger files from run 14900 and later have averaged calibrations for uncalibrated channels.

This reconstruction pass is also plagued by a bug affecting the associations of rb::Track objects from KalmanTrack to rb::Cluster objects from Slicer4D. In files for which filtering is applied, the associations are irreversibly scrambled and tracks may be missing. Since NuMI trigger files are not filtered, they are not affected by this bug, however, cosmic triggers are affected.