Project

General

Profile

SAM Data Catalog

The NOvA Experiment uses the SAM Data Catalog system to work with its datasets.

More information on SAM can be found here:

A general "cookbook" of common queries can be found here:

SAM Web Cookbook

Data

This page describe only the files produced with S11.07.27 or later. Description for older releases can be found at old data
At present we only have files processed in release S13-09-04 or greater for FarDet data.

Raw data

Raw data is stored in the Enstore Tape Library system.

To access raw data you will need to use the data catalog to locate the files you want.

Accessing the Data Catalog from a commandline

To access the data catalog, first setup the client tool which is the "samweb" program.

setup sam_web_client

Next locate the files you want. This involves retrieving a list of files that match the parameters you want. The current setup script does NOT include the experiment definition (i.e., nova), so this must be included in the command line. For example if you want raw files from run 12006 on the far detector you could enter:
samweb -e nova list-files "data_tier = raw AND Online.RunNumber = 12006 AND Online.Detector FarDet" 

You will be returned a list that looks like:
fardet_r00012006_s00_t00.raw
fardet_r00012006_s03_t00.raw
fardet_r00012006_s04_t02.raw
fardet_r00012006_s05.raw
fardet_r00012006_s05_t05.raw
fardet_r00012006_s08_t05.raw
fardet_r00012006_s10_t00.raw
fardet_r00012006_s11.raw
fardet_r00012006_s12_t02.raw
fardet_r00012006_s15.raw
fardet_r00012006_s16_t02.raw
fardet_r00012006_s17_t02.raw
fardet_r00012006_s17_t05.raw
fardet_r00012006_s18_t00.raw
fardet_r00012006_s18_t05.raw
fardet_r00012006_s21_t02.raw
.... (256 files total)

You could further refine your search by filtering on the trigger stream or any of the other metadata that is available from a file. Example for the stream 2 data (NuMI Beam) this would return 64 files: (Comment - stream 2 is the "calibration pulser" or "cosmic" stream. Stream 0 are NuMI beam triggers)

samweb -e nova list-files "data_tier = raw AND Online.RunNumber = 12006 AND data_stream 2 AND Online.Detector FarDet" 

To see the complete list of metadata that is available for a file you can do:

[anorman@novagpvm03 ~]$ samweb -e nova get-metadata fardet_r00012006_s62_t02.raw
                    File Name: fardet_r00012006_s62_t02.raw
                      File Id: 4355169
                    File Type: importedDetector
                  File Format: raw
                    File Size: 518914708
                          Crc: 199724340 (adler 32 crc type)
               Content Status: good
                        Group: nova
                    Data Tier: raw
                  Application: online datalogger 33
                  Event Count: 3634
                  First Event: 234914
                   Last Event: 238687
                   Start Time: 2013-12-15T13:51:20
                     End Time: 2013-12-15T13:52:54
                  Data Stream: 2
             Online.ConfigIDX: 0
          Online.DataLoggerID: 1
     Online.DataLoggerVersion: 33
              Online.Detector: fardet
            Online.DetectorID: 2
             Online.Partition: 1
          Online.RunControlID: 0
     Online.RunControlVersion: 0
            Online.RunEndTime: 1387115574
             Online.RunNumber: 12006
               Online.RunSize: 129728677
          Online.RunStartTime: 1387109761
               Online.RunType: 0
                Online.Stream: 2
         Online.SubRunEndTime: 1387115574
       Online.SubRunStartTime: 1387115480
                Online.Subrun: 62
           Online.TotalEvents: 3634
         Online.TriggerCtrlID: 0
        Online.TriggerListIDX: 0
Online.TriggerPrescaleListIDX: 0
        Online.TriggerVersion: 0
 Online.ValidTriggerTypesHigh: 0
Online.ValidTriggerTypesHigh2: 0
  Online.ValidTriggerTypesLow: 0
                         Runs: 12006.0062 (online)
               File Partition: 62

All of these fields can be selected against.

Now to find the location of a given file you can request that the replica catalog "locate" the file:

[anorman@novagpvm03 ~]$ samweb -e nova locate-file fardet_r00012006_s62_t02.raw
novadata:/nova/data/rawdata/FarDet/000120/12006/02
enstore:/pnfs/nova/rawdata/FarDet/000120/12006(495@vpe149)
In this case there are two separate copies of the file:
  1. One is located on the Bluearc central file system in the directory /nova/data/rawdata.... (novadata:/nova/data/rawdata/FarDet/000120/12006/02)
  2. The second is located on the dCache/Enstore system (enstore:/pnfs/nova/rawdata/FarDet/000120/12006/) and has a tape volume label of "495@vpe149" (don't worry about the tape label it only used internally for optimizing retrieval).

To get a copy of the file you can then "fetch" it to your current location using the "ifdh_fetch" command. (IFDH is the Intensity Frontier Data Handling system and is designed to retrieve/deliver files all over the world including directly to you!) You need to "setup ifdhc" first. Then, the fetch routine is smart and chooses the closest/fastest copy of a file to give you.

Note that "ifdh" only works if you have a grid computing account set up! If you don't yet, see https://cdcvs.fnal.gov/redmine/projects/novaart/wiki/Using_Condor_and_Running_on_Grid_from_IF_Cluster_Machines#Running-on-the-Grid. Then come back and try to fetch your data. You might also need to "setup fife_utils" to get the program into your path (hopefully included in your standard nova setup script, but if not...)

Example: (fetched from bluearc)

[anorman@novagpvm03 anorman]$ ifdh_fetch fardet_r00012006_s62_t02.raw
found file on enstore, using dcache srm
doing: ifdh cp "/pnfs/nova/rawdata/FarDet/000120/12006/fardet_r00012006_s62_t02.raw" "./$f" 
LOCK - Fri Feb 14 18:14:22 UTC 2014 LOCKS/LIMIT/QUEUE 5/5/281 sleeping 281
LOCK - Fri Feb 14 18:14:22 UTC 2014 queue 20140214.18:14:22.novagpvm03.15187.anorman.x509up_cp11537
LOCK - Fri Feb 14 18:14:31 UTC 2014 lock  /grid/data/nova/LOCK/LOCKS/20140214.18:14:31.9.novagpvm03.15187.anorman.x509up_cp11537
LOCK - Fri Feb 14 18:14:42 UTC 2014 freed /grid/data/nova/LOCK/LOCKS/20140214.18:14:31.9.novagpvm03.15187.anorman.x509up_cp11537
[anorman@novagpvm03 anorman]$ ls -l *.raw
-rw-r--r-- 1 anorman nova 518914708 Feb 14 12:14 fardet_r00012006_s62_t02.raw

Note that IFDH does the proper locking on copies from bluearc!

You can event fetch large numbers of files (here we'll grab those 64 files from stream 2 of run 12006):

[anorman@novagpvm03 anorman]$ ifdh_fetch --dims "data_tier = raw AND Online.RunNumber = 12006 AND data_stream 2 AND Online.Detector FarDet" 
found file on enstore, using dcache srm
doing: ifdh cp "/pnfs/nova/rawdata/FarDet/000120/12006/fardet_r00012006_s04_t02.raw" "./$f" 
LOCK - Fri Feb 14 18:18:57 UTC 2014 LOCKS/LIMIT/QUEUE 5/5/253 sleeping 253
LOCK - Fri Feb 14 18:18:57 UTC 2014 queue 20140214.18:18:57.novagpvm03.15661.anorman.x509up_cp11537
....

See the details of the IFDH package for other magic it can do.

All RAW data collected so far is available in

/nova/data/rawdata/FarDet
/nova/data/rawdata/NDOS

All files are read-only and owned by novaraw. Eventually these will only exist on tape.

Accessing the Data Catalog from a web interface

Files can also be accessed from the web by using one of the data catalog web pages. These use the same technology behind the scenes but make it a bit easier to input things like dates and times.

Nova Data Catalog Front Page

After performing a search, you can click on a file or list of files (or options displayed with a file) to get more information (i.e. metadata, locations, children, parents etc...)

Raw to Root files

All files are converted from the *.raw format from DAQ into the ART-ified *.root files for Offline.

When a file is processed through this step a new file with the Data Tier of "artdaq" (i.e. art/root format) is created. The file has an additional fields such as the "DAQ2RawDigit.base_release" set to match the release it was converted under.

To find these files you can do the same queries to the data catalog with the inclusion of the right data tier and release information.

For Example:

[anorman@novagpvm03 anorman]$ samweb -e nova list-files "data_tier = artdaq AND Online.RunNumber = 12006 and data_stream 2 AND Online.Detector = FarDet AND DAQ2RawDigit.base_release = S14-01-20 " 
fardet_r12006_s43_t02_cosmic_S14-01-20_v1_data.daq.root
fardet_r12006_s21_t02_cosmic_S14-01-20_v1_data.daq.root
fardet_r12006_s15_t02_cosmic_S14-01-20_v1_data.daq.root
fardet_r12006_s54_t02_cosmic_S14-01-20_v1_data.daq.root
fardet_r12006_s12_t02_cosmic_S14-01-20_v1_data.daq.root
....

Files can be located and fetched in the same manner as raw files:

[anorman@novagpvm03 anorman]$ samweb locate-file fardet_r12006_s15_t02_cosmic_S14-01-20_v1_data.daq.root
novadata:/nova/data/novaroot/fardet/S14-01-20/02
novadata:/nova/data/novaroot/fardet/S14-01-20/000120/12006/02
enstore:/pnfs/nova/production/raw2root/fardet/S14-01-20/000120/12006/02(9147@vpe183)
[anorman@novagpvm03 anorman]$ ifdh_fetch fardet_r12006_s15_t02_cosmic_S14-01-20_v1_data.daq.root
found file on enstore, using dcache srm
doing: ifdh cp "/pnfs/nova/production/raw2root/fardet/S14-01-20/000120/12006/02/fardet_r12006_s15_t02_cosmic_S14-01-20_v1_data.daq.root" "./$f" 
Fri Feb 14 19:04:14 UTC 2014
LOCK - returned excess lock 6/5
LOCK - Fri Feb 14 19:04:14 UTC 2014 LOCKS/LIMIT/QUEUE 6/5/17 sleeping 17
LOCK - Fri Feb 14 19:04:14 UTC 2014 queue 20140214.19:04:14.novagpvm03.26888.anorman.x509up_cp11537
LOCK - Fri Feb 14 19:04:15 UTC 2014 lock  /grid/data/nova/LOCK/LOCKS/20140214.19:04:15.1.novagpvm03.26888.anorman.x509up_cp11537
LOCK - Fri Feb 14 19:04:24 UTC 2014 freed /grid/data/nova/LOCK/LOCKS/20140214.19:04:15.1.novagpvm03.26888.anorman.x509up_cp11537
[anorman@novagpvm03 anorman]$ ls -l *.daq.root
-rw-r--r-- 1 anorman nova 349482306 Feb 14 13:04 fardet_r12006_s15_t02_cosmic_S14-01-20_v1_data.daq.root

Raw files that have been processed through the DAQ2RawDigit

They are located here:

/nova/data/novaroot/FarDet/<RELEASE>/
/nova/data/novaroot/NDOS/<RELEASE>/

<RELEASE> denotes the software release used for the version of DAQ2RawDigit that is being used.
Within these directories files are split up by leading run number and then full run number followed by a directory for the trigger type.
Example: If it is run 11250 for FarDet then the directory paths are:

/nova/data/novaroot/FarDet/S13-09-04/000112/11250/numi (NuMI-trigger data)
/nova/data/novaroot/FarDet/S13-09-04/000112/11250/cosmic (Cosmic-trigger data)

Reconstruction files and beyond

WARNING: DAR BE MONSTERS HERE! (this section is under construction and may not be up to date, like old maps of the flat world)

The reconstruction step is still maturing and is unofficial. These files were processed in tag S13-10-11.

The files live here:

/nova/prod/data/FarDet/S13-10-11/numi/reco/fardet_r000XXXXX_sXX_t00_numi_S13-10-11_v1.data.reco.root
/nova/prod/data/FarDet/S13-10-11/numi/caf/fardet_r000XXXXX_sXX_t00_numi_S13-10-11_v3.data.caf.root

Note that it is "v3" for CAF files. The previous two iterations were missing information.

Full Gain Cooled Data has been identified and Runs/Subruns selected during this period from run 11328 (start of 2 di-block cooling) and run 11406 (water leak after this run).
The Full Gain Cooled Data files are listed in the following file identified by hand by Nick:

/nova/data/novaroot/FarDet/S13-09-04/FullGainColdData.txt

A single hadd'ed CAF file exists for CAF analysis convenience for this period:
/nova/prod/data/FarDet/S13-10-11/numi/hadd/fardet_r11328-11406_t00_numi_S13-10-11_v3.data.caf.root
Note that this data has mangled channels in the FEB firmware.

The latest "best" list of Runs was produced when all the DCM's on diblocks 1 and 2 are running cold stably (except for a handful of warm APDs), with new firmware that fixed the channel mapping issue. There are no known time-sync or other issues with these runs.
Runs were identified by hand by Kanika and are listed in the following text file: TheGoodList.txt

A single hadd'ed CAF files exists that combines all subruns from these runs in this cold, stable, unmangled period.
A total of 2069 numi trigger files from runs 11496 through 11600 are included:
/nova/prod/data/FarDet/S13-10-11/numi/hadd/fardet_r11496-11600_S13-10-11_v3_numi.data.caf.root

Event Scanning files

There exist filtered files that have passed a data selection criterion. See the README in the directory where the files reside:

/nova/prod/data/FarDet/S13-10-11/numi/dailyFilt/

Processing configurations

A modified recoproductionjob.fcl and cafproductionjob.fcl were run with the following modules visited:

*exposure
*numispillinfo
*calhit
*slicer
*cosmictrack
*cana (currently being optimised)
*kalmantrack
*kalmantrackmerge
*multihough
*elasticarmshs
*fuzzykvertex
*remid
*cosrej
*nuesand
*numusand
*cafmaker

Special parameters were set for the "slicer" module:

*physics.producers.slicer.UseFCL: true
*physics.producers.slicer.Tres: 150
*physics.producers.slicer.TresByPE: false
*physics.producers.slicer.Epsilon: 2

And also the calibration was forced to use the MC calibration constants as we await data calibration constant:

services.user.Calibrator.UseMCcalib: true

Naming scheme.

  • det_rxxxxxxxx_sxx_txx.raw - raw data in binary format
  • det_rxxxxxxxx_sxx_txx_mask_release_vx.data.daq.root - raw data in root format
  • det_rxxxxxxxx_sxx_txx_mask_release_vx.data.reco.root - reconstructed data
  • det_rxxxxxxxx_sxx_txx_mask_release_vx.data.caf.root - caf files for reconstructed data

  • det is a detector name (ndos,neardet or fardet)
  • rxxxxxxxx is a unique run number (8 digits, e.g. r00011355)
  • sxx is a subrun numer (2 digits, e.g. s01)
  • txx denotes the trigger number (t followed by two digits, e.g. t02)
  • vx represents the version of this file (starting from 1)
  • the extensions denote what level of processing has been completed for this file

The trigger numbers correspond to:

  • t00 - NuMI trigger (mask = numi)
  • t01 - Booster trigger (mask = booster)
  • t02 - cosmic trigger (mask = cosmic)
  • blank - There is no stream name for files with a global trigger only -- everything from that run is written into this file.

<RELEASE> corresponds to the software release version that the files were processed within. Currently we are processing the raw->root step through S13-09-04.

Calibration PCHitLists

Alongside the raw->root processing the Production group also produces files containing PCHitLists for the Calibration group.
These files live here:

/nova/ana/calibration/FarDet/<RELEASE>

and then subsequently follow the same directory schema as outlined above for processed data. The files are only processed through the cosmic stream. Three files exist for each subrun:

fardet_r000XXXXX_sXX_t02_cosmic_S13-09-04_v1.data.pchits.reco.root
fardet_r000XXXXX_sXX_t02_cosmic_S13-09-04_v1.data.pchits.hist.root
fardet_r000XXXXX_sXX_t02_cosmic_S13-09-04_v1.data.pchitsstop.reco.root

OnMon & Nearline Data

All of the files that are processed on the nearline machines (both the nearline-reco histogram files and the OnMon files) are copied over for storage to the FNAL machines.
These files live here:

/nova/data/nearline-OnMon/FarDet/<RELEASE>
/nova/data/nearline/FarDet/<RELEASE>

You'll find more information about OnMon and the Nearline on the wiki for each:

OnMon wiki: https://cdcvs.fnal.gov/redmine/projects/novadaq-dqm-om/wiki
Nearline wiki: https://cdcvs.fnal.gov/redmine/projects/datacheck/wiki

Archiving data and access to data on tapes

Beam configuration table for NuMI beam.

Finding Data files at FNAL

ART allows you to provide a text file list of files to process and pass to the nova executable. There is a script which creates such list
https://cdcvs.fnal.gov/redmine/projects/novaart/repository/changes/Commissioning/GoodRuns.C
You can copy this script to your directory and read instructions inside the script on how to get required files list.
We also create a keep-up list of goodruns that is stored here: