Project

General

Profile

Official Datasets

This page is depreciated as it became very difficult to maintain given the number and diversity of datasets we produce. The new version can be found here.

Introduction

All modern NOvA data and MC resides within a data handling system called Sequential Access Metadata (SAM). The files are stored on a tape-backed disk array called dCache. Locations for the files are tracked using a database with a web interface, accessible through the samweb command line client utility. The SAM_web_cookbook provides an introduction to using the system. All of the datasets listed below are sam "definitions". These are a collection of file meta-data based logical queries which together serve to define a set of files within the SAM system.

As a throwback to the old days, some of the data is also stored on bluearc. CAFs are a notable group of files of which the most recent are almost always stored on bluearc. There are also small, selected samples of the most recent keep-up (see below) files available on bluearc in:

/nova/prod/data/keepup/samples/ 

Data and MC files are currently provided in two forms: datasets for the first analysis (FA) and keep-up datasets. First analysis dataset are those designed to be used in the first analysis and as such are processed in stable, tested versions of the software including the latest calibration and state-of-the-art reconstruction and PID algorithms. Keep-up datasets, on the other hand, contain data processed as it comes off of the detectors in as close to real time as possible. These datasets are designed to offer users an early look at data and, as such, may not always include the most up to date calibration or reconstruction. All keep-up datasets include "keepup" in their dataset names.

First analysis production proceeds in a stepwise manner starting with simulation and progress through reconstruction to PID. At each stage the files from the previous stage are processed to produce a final dataset of simulation, reconstruction or PID files. At the same time a set of the next stage of files are also produced for validation use. At the current time production have mostly completed the simulation and reconstruction stages and are producing PID validation files.

As of January 2015 production will produce two datasets for each data, dataset. One of these will have the good runs list requirement applied (dq.isgoodrun = true) and will be suffixed with "goodruns" and a second which doesn't include this.

Contents

Data

First pass data processing for the first analysis is currently complete through the reconstruction stage, with PID validation files being produced now. The input dataset to these samples is the official first-analysis run lists generated by Ryan Patterson and stored in this directory:

  /nova/app/users/rbpatter/runlists/

FD

Data tier Cosmic trigger Cosmic trigger + GRL NuMI trigger
artdaq prod_artdaq_fd_cosmic_fa_runlist prod_artdaq_fd_cosmic_fa_runlist_goodruns prod_artdaq_fd_numi_fa_goodruns
pclist prod_pclist_S14-09-29_fd_cosmic_keepup n/a
pcliststop prod_pcliststop_S14-09-29_fd_cosmic_keepup n/a
timecal prod_timecal_S14-09-29_fd_cosmic_keepup n/a
reco prod_reco_FA14-10-28_fd_cosmic_fa_runlist prod_reco_FA14-10-28_fd_cosmic_fa_runlist_goodruns prod_reco_FA14-10-28_fd_numi_fa_goodruns
reco - keep up - post shutdown prod_reco_S15-01-12_fd_numi_keepup
PID - for validation prod_pid_FA14-10-28_fd_cosmic_fa_runlist prod_pid_FA14-10-28_fd_cosmic_fa_runlist_goodruns prod_pid_FA14-10-28_fd_numi_fa_goodruns
CAF - for validation prod_caf_FA14-12-16_fd_cosmic_fa_runlist prod_caf_FA14-12-16_fd_cosmic_fa_runlist_goodruns prod_caf_FA15-01-16_fd_numi_fa_runlist_goodruns
CAF - keep up - post shutdown prod_caf_S15-01-12_fd_numi_keepup

ND

Data tier Cosmic trigger DD activity trigger DD tricell trigger NuMI trigger
artdaq prod_artdaq_FA14-10-03_nd_numi_fullgain_goodruns
pclist prod_pclist_S14-09-29_nd_cosmic_raw2root_FA14-10-03_keepup prod_pclist_S14-09-29_nd_DDActivity1_raw2root_FA14-10-03_keepup prod_pclist_S14-09-29_nd_DDCalMu_raw2root_FA14-10-03_keepup n/a
pcliststop prod_pcliststop_S14-09-29_nd_cosmic_raw2root_FA14-10-03_keepup prod_pcliststop_S14-09-29_nd_DDActivity1_raw2root_FA14-10-03_keepup prod_pcliststop_S14-09-29_nd_DDCalMu_raw2root_FA14-10-03_keepup n/a
timecal prod_timecal_S14-09-29_nd_cosmic_raw2root_FA14-10-03_keepup prod_timecal_S14-09-29_nd_DDActivity1_raw2root_FA14-10-03_keepup prod_timecal_S14-09-29_nd_DDCalMu_raw2root_FA14-10-03_keepup n/a
reco prod_reco_FA14-12-29_nd_numi_fullgain_goodruns
reco - keep up - preshutdown prod_reco_S14-09-29_neardet_numi_keepup
reco - keep up - postshutdown prod_reco_S15-01-12_nd_numi_keepup
pid - validation prod_pid_FA14-12-29_nd_numi_fullgain_goodruns
caf - validation prod_caf_FA14-12-29_nd_numi_fullgain_goodruns
caf - keep up - preshutdown prod_caf_S14-09-29_neardet_numi_keepup
caf - keep up - postshutdown prod_caf_S15-01-12_nd_numi_keepup

In datasets are provided for the NuMI, with and without the GRL and before and after the shutdown, these follow the pattern:

prod_artdaq_FA14-10-03_nd_numi_fullgain
prod_artdaq_FA14-10-03_nd_numi_fullgain_goodruns
prod_artdaq_FA14-10-03_nd_numi_fullgain_preshutdown
prod_artdaq_FA14-10-03_nd_numi_fullgain_preshutdown_goodruns
prod_artdaq_FA14-10-03_nd_numi_fullgain_postshutdown
prod_artdaq_FA14-10-03_nd_numi_fullgain_posthutdown_goodruns

and are available for the artdaq, reco, pid - validation and caf - validation samples.

Monte Carlo

As with data, first pass MC processing for the first analysis is currently complete through the reconstruction stage, with PID validation produced. The only exception to this are the RHC files which are still being simulated.

In order to facilitate both first analysis studies and future sensitivity studies two types of MC have been produced. Firstly, those with real detector like configurations. These files are simulated with run numbers (and hence the di-block and active channels configurations of the matching data runs) and a PoT weighting which replicates that in the first analysis data datasets. Secondly, those with "ideal" 14-DB configurations to be used in future sensitivity studies.

FD & ND MC - Real detector-like conditions

FarDet FHC swap FHC nonswap FHC tau Cosmics
artdaq prod_daq_FA14-10-03_fd_genie_fhc_fluxswap prod_daq_FA14-10-03_fd_genie_fhc_nonswap prod_daq_FA14-10-03_fd_genie_fhc_tau prod_daq_FA14-10-03_fd_cry_all
reco prod_reco_FA14-10-28_fd_genie_fhc_fluxswap prod_reco_FA14-10-28_fd_genie_fhc_nonswap prod_reco_FA14-10-28_fd_genie_fhc_tau prod_reco_FA14-10-28_fd_cry_all
pid prod_pid_FA15-01-12_fd_genie_fhc_fluxswap prod_pid_FA15-01-12_fd_genie_fhc_nonswap prod_pid_FA15-01-12_fd_genie_fhc_tau prod_pid_FA15-01-12_fd_cry_all
caf prod_caf_FA15-01-12_fd_genie_fhc_fluxswap prod_caf_FA15-01-12_fd_genie_fhc_nonswap prod_caf_FA15-01-12_fd_genie_fhc_tau prod_caf_FA15-01-12_fd_cry_all
NearDet preshutdown FHC nonswap
artdaq prod_artdaq_FA14-10-03_nd_genie_nonswap_preshutdown_downsampled
reco prod_reco_FA14-12-29_nd_genie_nonswap_preshutdown_downsampled
pid - validation prod_pid_FA14-12-29_nd_genie_nonswap_preshutdown_downsampled
caf - validation prod_caf_FA14-12-29_nd_genie_nonswap_preshutdown_downsampled
NearDet postshutdown FHC nonswap
artdaq prod_artdaq_FA14-10-03_nd_genie_nonswap_postshutdown
reco prod_reco_FA14-12-29_nd_genie_nonswap_postshutdown
pid - validation prod_pid_FA14-12-29_nd_genie_nonswap_postshutdown
caf - validation prod_caf_FA14-12-29_nd_genie_nonswap_postshutdown

FD MC - Ideal conditions (14db)

FarDet FHC swap FHC nonswap FHC tau
artdaq prod_daq_FA14-10-03_fd_genie_fhc_fluxswap_14db prod_daq_FA14-10-03_fd_genie_fhc_nonswap_14db prod_daq_FA14-10-03_fd_genie_fhc_tau_14db
reco prod_reco_FA14-10-28_fd_genie_fhc_fluxswap_14db prod_reco_FA14-10-28_fd_genie_fhc_nonswap_14db prod_reco_FA14-10-28_fd_genie_fhc_tau_14db
pid - validation prod_pid_FA14-10-28_fd_genie_fhc_fluxswap_14db prod_pid_FA14-10-28_fd_genie_fhc_nonswap_14db prod_pid_FA14-10-28_fd_genie_fhc_tau_14db
caf - validation prod_caf_FA14-10-28_fd_genie_fhc_fluxswap_14db prod_caf_FA14-10-28_fd_genie_fhc_nonswap_14db prod_caf_FA14-10-28_fd_genie_fhc_tau_14db
FarDet continued RHC swap RHC nonswap RHC tau
artdaq prod_daq_FA14-10-03_fd_genie_rhc_fluxswap_14db prod_daq_FA14-10-03_fd_genie_rhc_nonswap_14db prod_daq_FA14-10-03_fd_genie_rhc_tau_14db
reco prod_reco_FA14-10-28_fd_genie_rhc_fluxswap_14db prod_reco_FA14-10-28_fd_genie_rhc_nonswap_14db prod_reco_FA14-10-28_fd_genie_rhc_tau_14db
pid prod_pid_FA14-10-28_fd_genie_rhc_fluxswap_14db prod_pid_FA14-10-28_fd_genie_rhc_nonswap_14db prod_pid_FA14-10-28_fd_genie_rhc_tau_14db
caf prod_caf_FA14-10-28_fd_genie_rhc_fluxswap_14db prod_caf_FA14-10-28_fd_genie_rhc_nonswap_14db prod_caf_FA14-10-28_fd_genie_rhc_tau_14db
FarDet continued Cosmics
artdaq prod_artdaq_FA14-10-03_fd_cry_14db
reco prod_reco_FA14-10-28_fd_cry_14db
pid prod_pid_FA14-10-28_fd_cry_14db
caf prod_caf_FA14-10-28_fd_cry_14db

ND ideal conditions

NearDet Cosmics
artdaq prod_daq_FA14-10-03_nd_cry_all
pclist prod_pclist_S14-09-29_nd_cry
pcliststop prod_pcliststop_S14-09-29_nd_cry
timecal prod_timecal_S14-09-29_nd_cry

Supporting sample MC

In addition to the core samples discussed above, some dedicated samples have been produced to study particular effects.

FD MC - Real detector-like conditions, geojittered.

FarDet FHC swap FHC nonswap FHC tau
artdaq prod_daq_FA14-10-03_fd_genie_fhc_fluxswap_geojittered prod_daq_FA14-10-03_fd_genie_fhc_nonswap_geojittered prod_daq_FA14-10-03_fd_genie_fhc_tau_geojittered
reco prod_reco_FA14-10-28_fd_genie_fhc_fluxswap_geojittered prod_reco_FA14-10-28_fd_genie_fhc_nonswap_geojittered prod_reco_FA14-10-28_fd_genie_fhc_tau_geojittered
pid prod_pid_FA15-01-12_fd_genie_fhc_fluxswap_geojittered prod_pid_FA15-01-12_fd_genie_fhc_nonswap_geojittered prod_pid_FA15-01-12_fd_genie_fhc_tau_geojittered
caf prod_caf_FA15-01-12_fd_genie_fhc_fluxswap_geojittered prod_caf_FA15-01-12_fd_genie_fhc_nonswap_geojittered prod_caf_FA15-01-12_fd_genie_fhc_tau_geojittered

Processing notes

This section attempts to briefly summarise issues that users should be aware of when using the above datasets. Full details on the releases used can be found on the History_of_Tagged_Releases page.

FA14-10-03 Data raw2root

This version of the software includes the ND geometry version used in the FA MC simulation.

No known issues.

FA14-10-03 MC Simulation

This is the official simulation for the first analysis datasets.

No known issues.

FA14-10-28 FD Reconstruction

This is the official reconstruction for the first analysis datasets. It was processed with v04 FD calibrations.

No known issues.

FA14-10-28 FD PID and CAF

These are the PID and CAF validation samples. They are designed so that the physics groups can tune PIDs before the official first analysis production.

No known issues.

The CAFs can be found here:

/nova/prod/mc/FA14-10-28/genie/fd/caf/
/nova/prod/mc/FA14-10-28/cry/fd/caf/

The real-detector configurations CAFs live in sub-directories with numbers 000129-000170, and the "ideal" MC in sub-directory 010000. Note the real detector configurations genie folders contain both the baseline MC and the geojittered MC, so anyone getting files from this location should be sure to require the "genie_fhc" string be contained in their CAF file name if they want to study the baseline sample and similarly that "geojittered" be there if they want to study that sample.

FA14-11-25 ND reconstruction, PID and CAF

This is the release used for version 4 of the ND rapid turn around. These files pick up a few notable changes with respect to those files processed in v3 (FA14-11-11, see LegacyDatasets for details). First, we now have a new set of subrun-by-subrun channel masks. Second, an updated absolute calibration has been applied in response to discrepancies observed by the analysis groups. An error involving infinite calibrated energies in the muon catcher has also been resolved. This round of processing is meant to include post-shutdown data. That effort, however, is currently waiting on the good runs list being updated by the DQ group. Once that effort is complete, it will be an easy task to add the post-shutdown data. Users will be notified when the processing is completed.

Only a subsample of all MC events were processed through reconstruction, PID and CAF (~7M / 34M events) at the moment.

Known issues:

  • It has been reported that the cosmic rejection variables are not filled correctly in files processed in S14-11-25. As reconstruction wasn't frozen until the 27th, this bug likely affects these files.

The CAFs can be found here:

MC -- /nova/prod/mc/FA14-11-25/genie/nd/caf/
Data -- /nova/prod/data/FA14-11-25/numi/nd/caf/

FA14-12-16 FD data CAF

No known issues.

FA14-12-29 ND rapid-turn around v5

No known issues.

S14-09-29 Data keep-up & MC calibration

FD & ND calibration files are currently being produced for the FD cosmic stream as well as the ND DD activity, DD cal mu (tri-cell) and cosmic streams. These samples are constantly topped up using cron jobs. These files were used to produce the v05 ND calibration uses in the latest ND reconstruction files.

Known issues:

  • The ND data files have been reconstructed with an old ND geometry.

S14-09-29 Keep-up reconstruction

Most of the same caveats apply as did for the S14-09-09 reco sample detailed on the LegacyDatasets page, but will be repeated below for completeness. The big change in the S14-09-29 is the addition of new information to facilitate basic neutrino searches. There is a new sel.containment branch in the CAFs and the numu CosRej has been included for the FD stream.

Some additional notes:

  • The initial target for this processing is all of the data before the shutdown and after the end of the neutrino hunt. Unless problems are found, back-processing will extend the sample to before the neutrino hunt. * Future reco keep-up will proceed in the near future in a modern release and will include post-shutdown (October 2014) data. * The FD reco keep-up is blinded. * FD calibration constants are currently only available through diblock 7, but averaged constants are used beyond that point. * These datasets are very large. Using SAM projects which include the entire dataset is discouraged. For assistance in breaking up the sample, reference the Sam Web Cookbook or email . * Near detector reconstruction is currently lacking calibration, which will affect the output of any module depends on it. The most notable example is FuzzyKVertex, but the MichelE filters could also be affected. Slicer, CosmicTrack and KalmanTrack produce output independent of reconstruction. * Near detector reconstruction is also currently lacking channel masks and data quality information.

The CAFs can be found here:

/nova/prod/data/keepup/S14-09-29/numi/fd/000XXX/XXXYY/ 
/nova/prod/data/keepup/S14-09-09/numi/nd/000XXX/YYYYY/ 

where XXX are the first three digits of the run number and YY are the last two.

S15-01-12 Keep-up reconstruction

This is the first pass at post-shutdown reconstruction. As with all keep-up reco, the FD reco keep-up is blinded and does not include PID, further no good-runs list has been used.

No known issues.

The CAFs can be found here:

/nova/prod/data/keepup/S15-01-12/numi/fd/000XXX/XXXYY/ 
/nova/prod/data/keepup/S15-01-12/numi/nd/000XXX/YYYYY/ 

where XXX are the first three digits of the run number and YY are the last two.