Official Datasets

This page documents the most up to date datasets produced by the production group. Information on older legacy files is kept on the LegacyDatasets page.


All modern NOvA data and MC resides within a data handling system called Sequential Access Metadata (SAM). The files are stored on a tape-backed disk array called dCache. Locations for the files are tracked using a database with a web interface, accessible through the samweb command line client utility. The SAM_web_cookbook provides an introduction to using the system. All of the datasets listed below are sam "definitions". These are a collection of file meta-data based logical queries which together serve to define a set of files within the SAM system.

As a throwback to the old days, some of the data is also stored on bluearc. CAFs are a notable group of files of which the most recent are almost always stored on bluearc. There are also small, selected samples of the most recent keep-up (see below) files available on bluearc in:


Data and MC files are currently provided in two forms: datasets for the first analysis (FA) and keep-up datasets. First analysis dataset are those designed to be used in the first analysis and as such are processed in stable, tested versions of the software including the latest calibration and state-of-the-art reconstruction and PID algorithms. Keep-up datasets, on the other hand, contain data processed as it comes off of the detectors in as close to real time as possible. These datasets are designed to offer users an early look at data and, as such, may not always include the most up to date calibration or reconstruction. All keep-up datasets include "keepup" in their dataset names.

First analysis production proceeds in a stepwise manner starting with simulation and progress through reconstruction to PID. At each stage the files from the previous stage are processed to produce a final dataset of simulation, reconstruction or PID files. At the same time a set of the next stage of files are also produced for validation use. At the current time production have mostly completed the simulation and reconstruction stages and are producing PID validation files.



First pass data processing for the first analysis is currently complete through the reconstruction stage, with PID validation files being produced now. The input dataset to these samples is the official first-analysis run lists generated by Ryan Patterson and stored in this directory:



Data tier Cosmic trigger NuMI trigger
artdaq prod_artdaq_fd_cosmic_fa_goodruns prod_artdaq_fd_numi_fa_goodruns
pclist prod_pclist_S14-09-29_fd_cosmic_keepup n/a
pcliststop prod_pcliststop_S14-09-29_fd_cosmic_keepup n/a
timecal prod_timecal_S14-09-29_fd_cosmic_keepup n/a
reco - for analysis prod_reco_FA14-10-28_fd_cosmic_fa_goodruns prod_reco_FA14-10-28_fd_numi_fa_goodruns
reco - keep up prod_reco_S14-09-29_fardet_numi_keepup
CAF - keep up prod_caf_S14-09-29_fardet_numi_keepup


Data tier Cosmic trigger DD activity trigger DD tricell trigger NuMI trigger
artdaq prod_artdaq_FA14-10-03_nd_numi_fullgain_preshutdown_goodruns
pclist prod_pclist_S14-09-29_nd_cosmic_keepup prod_pclist_S14-09-29_nd_DDActivity1_keepup prod_pclist_S14-09-29_nd_DDCalMu_keepup n/a
pcliststop prod_pcliststop_S14-09-29_nd_cosmic_keepup prod_pcliststop_S14-09-29_nd_DDActivity1_keepup prod_pcliststop_S14-09-29_nd_DDCalMu_keepup n/a
timecal prod_timecal_S14-09-29_nd_cosmic_keepup prod_timecal_S14-09-29_nd_DDActivity1_keepup prod_timecal_S14-09-29_nd_DDCalMu_keepup n/a
reco - for analysis prod_reco_FA14-11-11_nd_numi_fullgain_preshutdown_goodruns
reco - keep up prod_reco_S14-09-29_neardet_numi_keepup
pid - validation prod_pid_FA14-11-11_nd_numi_fullgain_preshutdown_goodruns
caf - validation prod_caf_FA14-11-11_nd_numi_fullgain_preshutdown_goodruns
caf - keep up prod_caf_S14-09-29_neardet_numi_keepup

Monte Carlo

As with data, first pass MC processing for the first analysis is currently complete through the reconstruction stage, with PID validation produced. The only exception to this are the RHC files which are still being simulated.

In order to facilitate both first analysis studies and future sensitivity studies two types of MC have been produced. Firstly, those with real detector like configurations. These files are simulated with run numbers (and hence the di-block and active channels configurations of the matching data runs) and a PoT weighting which replicates that in the first analysis data datasets. Secondly, those with "ideal" 14-DB configurations to be used in future sensitivity studies.

FD & ND MC - Real detector-like conditions

FarDet FHC swap FHC nonswap FHC tau Cosmics
artdaq prod_daq_FA14-10-03_fd_genie_fhc_fluxswap prod_daq_FA14-10-03_fd_genie_fhc_nonswap prod_daq_FA14-10-03_fd_genie_fhc_tau prod_daq_FA14-10-03_fd_cry_all
reco prod_reco_FA14-10-28_fd_genie_fhc_fluxswap prod_reco_FA14-10-28_fd_genie_fhc_nonswap prod_reco_FA14-10-28_fd_genie_fhc_tau prod_reco_FA14-10-28_fd_cry_all
pid - validation prod_pid_FA14-10-28_fd_genie_fhc_fluxswap prod_pid_FA14-10-28_fd_genie_fhc_nonswap prod_pid_FA14-10-28_fd_genie_fhc_tau prod_pid_FA14-10-28_fd_cry_all
caf - validation prod_caf_FA14-10-28_fd_genie_fhc_fluxswap prod_caf_FA14-10-28_fd_genie_fhc_nonswap prod_caf_FA14-10-28_fd_genie_fhc_tau prod_caf_FA14-10-28_fd_cry_all
NearDet FHC nonswap Cosmics
artdaq prod_daq_FA14-10-03_nd_genie_fhc_nonswap prod_daq_FA14-10-03_nd_cry_all
pclist n/a prod_pclist_S14-09-29_nd_cry
pcliststop n/a prod_pcliststop_S14-09-29_nd_cry
timecal n/a prod_timecal_S14-09-29_nd_cry
reco - validation prod_reco_FA14-11-11_nd_genie_nonswap_smallsample
pid - validation prod_pid_FA14-11-11_nd_genie_nonswap_smallsample
caf - validation prod_caf_FA14-11-11_nd_genie_nonswap_smallsample

FD MC - Ideal conditions (14db)

FarDet FHC swap FHC nonswap FHC tau
artdaq prod_daq_FA14-10-03_fd_genie_fhc_fluxswap_14db prod_daq_FA14-10-03_fd_genie_fhc_nonswap_14db prod_daq_FA14-10-03_fd_genie_fhc_tau_14db
reco prod_reco_FA14-10-28_fd_genie_fhc_fluxswap_14db prod_reco_FA14-10-28_fd_genie_fhc_nonswap_14db prod_reco_FA14-10-28_fd_genie_fhc_tau_14db
pid - validation prod_pid_FA14-10-28_fd_genie_fhc_fluxswap_14db prod_pid_FA14-10-28_fd_genie_fhc_nonswap_14db prod_pid_FA14-10-28_fd_genie_fhc_tau_14db
caf - validation prod_caf_FA14-10-28_fd_genie_fhc_fluxswap_14db prod_caf_FA14-10-28_fd_genie_fhc_nonswap_14db prod_caf_FA14-10-28_fd_genie_fhc_tau_14db
FarDet continued RHC swap RHC nonswap RHC tau Cosmics
artdaq prod_daq_FA14-10-03_fd_genie_rhc_fluxswap_14db prod_daq_FA14-10-03_fd_genie_rhc_nonswap_14db prod_daq_FA14-10-03_fd_genie_rhc_tau_14db

Supporting sample MC

In addition to the core samples discussed above, some dedicated samples have been produced to study particular effects.

FD MC - Real detector-like conditions, geojittered.

FarDet FHC swap FHC nonswap FHC tau
artdaq prod_daq_FA14-10-03_fd_genie_fhc_fluxswap_geojittered prod_daq_FA14-10-03_fd_genie_fhc_nonswap_geojittered prod_daq_FA14-10-03_fd_genie_fhc_tau_geojittered
reco prod_reco_FA14-10-28_fd_genie_fhc_fluxswap_geojittered prod_reco_FA14-10-28_fd_genie_fhc_nonswap_geojittered prod_reco_FA14-10-28_fd_genie_fhc_tau_geojittered
pid - validation prod_pid_FA14-10-28_fd_genie_fhc_fluxswap_geojittered prod_pid_FA14-10-28_fd_genie_fhc_nonswap_geojittered prod_pid_FA14-10-28_fd_genie_fhc_tau_geojittered
caf - validation prod_caf_FA14-10-28_fd_genie_fhc_fluxswap_geojittered prod_caf_FA14-10-28_fd_genie_fhc_nonswap_geojittered prod_caf_FA14-10-28_fd_genie_fhc_tau_geojittered

Processing notes

This section attempts to briefly summarise issues that users should be aware of when using the above datasets. Full details on the releases used can be found on the History_of_Tagged_Releases page.

FA14-10-03 Data raw2root

This version of the software includes the ND geometry version used in the FA MC simulation.

No known issues.

FA14-10-03 MC Simulation

This is the official simulation for the first analysis datasets.

No known issues.

FA14-10-28 FD Reconstruction

This is the official reconstruction for the first analysis datasets. It was processed with v04 FD calibrations.

No known issues.

FA14-10-28 FD PID and CAF

These are the PID and CAF validation samples. They are designed so that the physics groups can tune PIDs before the official first analysis production.

No known issues.

The CAFs can be found here:


The real-detector configurations CAFs live in sub-directories with numbers 000129-000170, and the "ideal" MC in sub-directory 010000. Note the real detector configurations genie folders contain both the baseline MC and the geojittered MC, so anyone getting files from this location should be sure to require the "genie_fhc" string be contained in their CAF file name if they want to study the baseline sample and similarly that "geojittered" be there if they want to study that sample.

FA14-11-11 ND reconstruction, PID and CAF

This release includes the most modern (v05) ND calibrations and represent the official ND reconstruction for the first analysis. The PID and CAF samples are validation samples designed so that the physics groups can tune PIDs before the official first analysis production. Only a subsample of all MC events were processed through reconstruction, PID and CAF (~7M / 34M events) at the moment. This sample will not be topped up due to the issues discussed below, however it will be superseded in the near future.

Known issues:

  • There is a bug in the calibrated energy for cells in the muon catcher with unphysical W-values whereby these cells receive infinite energies.

The CAFs can be found here:


S14-09-29 Data keep-up & MC calibration

FD & ND calibration files are currently being produced for the FD cosmic stream as well as the ND DD activity, DD cal mu (tri-cell) and cosmic streams. These samples are constantly topped up using cron jobs. These files were used to produce the v05 ND calibration uses in the latest ND reconstruction files.

Known issues:

  • The ND data files have been reconstructed with an old ND geometry.

S14-09-29 Keep-up reconstruction

Most of the same caveats apply as did for the S14-09-09 reco sample detailed on the LegacyDatasets page, but will be repeated below for completeness. The big change in the S14-09-29 is the addition of new information to facilitate basic neutrino searches. There is a new sel.containment branch in the CAFs and the numu CosRej has been included for the FD stream.

Some additional notes:

  • The initial target for this processing is all of the data before the shutdown and after the end of the neutrino hunt. Unless problems are found, back-processing will extend the sample to before the neutrino hunt. * Future reco keep-up will proceed in the near future in a modern release and will include post-shutdown (October 2014) data. * The FD reco keep-up is blinded. * FD calibration constants are currently only available through diblock 7, but averaged constants are used beyond that point. * These datasets are very large. Using SAM projects which include the entire dataset is discouraged. For assistance in breaking up the sample, reference the Sam Web Cookbook or email . * Near detector reconstruction is currently lacking calibration, which will affect the output of any module depends on it. The most notable example is FuzzyKVertex, but the MichelE filters could also be affected. Slicer, CosmicTrack and KalmanTrack produce output independent of reconstruction. * Near detector reconstruction is also currently lacking channel masks and data quality information.

The CAFs can be found here:


where XXX are the first three digits of the run number and YY are the last two.