Production Infrastructure and Scripts

The following information outlines the manner in which the NOvA data processing and Monte Carlo production chains are structured.

Data Processing

After the data is taken by the detectors, it is transferred via the FTS system to Enstore and to an area on the Bluearc and cataloged with SAM. Both of these areas (tape and bluearc) are registered with SAM so that data can be delivered from either.

The first level of production processing that is run on the raw data are a set of jobs which:

  • Unpack the online/DAQ formated data files
  • Create ART data productions from the unpacked raw information
  • Write out ROOT formated data files containing the new ART data products

The jobs also calculate channel occupancies to identify bad/good channel maps for the run. These maps are recorded in the central NOvA data bases.

Currently this is achieved through the use of two scripts. The first scripts generates the second script with specialization to the input file.

The flow chart of production processing scripts:
Raw to Root Conversion Flow chart

For the validation of the processing the "eventdump.fcl" job is run which does minimal processing to set through the output data file. The only check that is made is that the job succeeds and exits with a Zero return value.

This set of scripts lives in the Utilities package under the "batch" directory.

Monte Carlo Generation

The Monte Carlo generation scripts were written/maintained by Mat Meuther. The flow chart of what the scripts do is:
Monte Carlo Flow Chart

This set of scripts attempts to handle all the different possible combinations of Monte Carlo requests that may be requested, and generates a script that can then run on the worker node to handle the request. It works by uses a set of external scripts that perform a number of different parsing passes to build the final job submission.

This set of scripts lives in the MCGenerator package.

Reconstruction & Analysis

For the reconstruction and analysis level of production the following flow chart describes the job flow:
Reco/Analysis Flow Chart

These scripts are designed to run on either data or Monte Carlo input files. The scripts work by parsing the options that were passed in to them and determining the input type (data, Monte Carlo, trigger type or generation type) and form there build a job script that is tailored to the input.

The script also setup a bookkeeping system through a series of directories and symbolic links which can be parsed to determine the state of a given input file in the processing.

The actual job scripts that are generated have an essentially linear flow, whereby they:

  1. Perform bookkeeping duties
  2. Setup the software environment for the NOvA release
  3. Stage in the input data
  4. Perform limited integrity checking on the input
  5. Stage in auxiliary files for the specific analysis
  6. Run the nova job executable
  7. Perform limited integrity checking on the output
  8. Perform booking at end of job
  9. Stage out the output files

This set of scripts lives in the Utilities package under the "batch" directory.

CAF Tree Generation

These were generated in some other manner and the scripts used are being deprecated in favor of something that looks more like the reco/analysis flow.