SAM Data Catalog¶
- Table of contents
- SAM Data Catalog
- Raw data
- Raw to Root files
- Reconstruction files and beyond
- Calibration PCHitLists
- OnMon & Nearline Data
- Archiving data and access to data on tapes
- Beam configuration table for NuMI beam.
- Finding Data files at FNAL
The NOvA Experiment uses the SAM Data Catalog system to work with its datasets.More information on SAM can be found here:
A general "cookbook" of common queries can be found here:
This page describe only the files produced with S11.07.27 or later. Description for older releases can be found at old data
At present we only have files processed in release S13-09-04 or greater for FarDet data.
Raw data is stored in the Enstore Tape Library system.
To access raw data you will need to use the data catalog to locate the files you want.
Accessing the Data Catalog from a commandline¶
To access the data catalog, first setup the client tool which is the "samweb" program.
Next locate the files you want. This involves retrieving a list of files that match the parameters you want. The current setup script does NOT include the experiment definition (i.e., nova), so this must be included in the command line. For example if you want raw files from run 12006 on the far detector you could enter:
samweb -e nova list-files "data_tier = raw AND Online.RunNumber = 12006 AND Online.Detector FarDet"
You will be returned a list that looks like:
fardet_r00012006_s00_t00.raw fardet_r00012006_s03_t00.raw fardet_r00012006_s04_t02.raw fardet_r00012006_s05.raw fardet_r00012006_s05_t05.raw fardet_r00012006_s08_t05.raw fardet_r00012006_s10_t00.raw fardet_r00012006_s11.raw fardet_r00012006_s12_t02.raw fardet_r00012006_s15.raw fardet_r00012006_s16_t02.raw fardet_r00012006_s17_t02.raw fardet_r00012006_s17_t05.raw fardet_r00012006_s18_t00.raw fardet_r00012006_s18_t05.raw fardet_r00012006_s21_t02.raw .... (256 files total)
You could further refine your search by filtering on the trigger stream or any of the other metadata that is available from a file. Example for the stream 2 data (NuMI Beam) this would return 64 files: (Comment - stream 2 is the "calibration pulser" or "cosmic" stream. Stream 0 are NuMI beam triggers)
samweb -e nova list-files "data_tier = raw AND Online.RunNumber = 12006 AND data_stream 2 AND Online.Detector FarDet"
To see the complete list of metadata that is available for a file you can do:
[anorman@novagpvm03 ~]$ samweb -e nova get-metadata fardet_r00012006_s62_t02.raw File Name: fardet_r00012006_s62_t02.raw File Id: 4355169 File Type: importedDetector File Format: raw File Size: 518914708 Crc: 199724340 (adler 32 crc type) Content Status: good Group: nova Data Tier: raw Application: online datalogger 33 Event Count: 3634 First Event: 234914 Last Event: 238687 Start Time: 2013-12-15T13:51:20 End Time: 2013-12-15T13:52:54 Data Stream: 2 Online.ConfigIDX: 0 Online.DataLoggerID: 1 Online.DataLoggerVersion: 33 Online.Detector: fardet Online.DetectorID: 2 Online.Partition: 1 Online.RunControlID: 0 Online.RunControlVersion: 0 Online.RunEndTime: 1387115574 Online.RunNumber: 12006 Online.RunSize: 129728677 Online.RunStartTime: 1387109761 Online.RunType: 0 Online.Stream: 2 Online.SubRunEndTime: 1387115574 Online.SubRunStartTime: 1387115480 Online.Subrun: 62 Online.TotalEvents: 3634 Online.TriggerCtrlID: 0 Online.TriggerListIDX: 0 Online.TriggerPrescaleListIDX: 0 Online.TriggerVersion: 0 Online.ValidTriggerTypesHigh: 0 Online.ValidTriggerTypesHigh2: 0 Online.ValidTriggerTypesLow: 0 Runs: 12006.0062 (online) File Partition: 62
All of these fields can be selected against.
Now to find the location of a given file you can request that the replica catalog "locate" the file:
[anorman@novagpvm03 ~]$ samweb -e nova locate-file fardet_r00012006_s62_t02.raw novadata:/nova/data/rawdata/FarDet/000120/12006/02 enstore:/pnfs/nova/rawdata/FarDet/000120/12006(495@vpe149)In this case there are two separate copies of the file:
- One is located on the Bluearc central file system in the directory /nova/data/rawdata.... (novadata:/nova/data/rawdata/FarDet/000120/12006/02)
- The second is located on the dCache/Enstore system (enstore:/pnfs/nova/rawdata/FarDet/000120/12006/) and has a tape volume label of "495@vpe149" (don't worry about the tape label it only used internally for optimizing retrieval).
To get a copy of the file you can then "fetch" it to your current location using the "ifdh_fetch" command. (IFDH is the Intensity Frontier Data Handling system and is designed to retrieve/deliver files all over the world including directly to you!) You need to "setup ifdhc" first. Then, the fetch routine is smart and chooses the closest/fastest copy of a file to give you.
Note that "ifdh" only works if you have a grid computing account set up! If you don't yet, see https://cdcvs.fnal.gov/redmine/projects/novaart/wiki/Using_Condor_and_Running_on_Grid_from_IF_Cluster_Machines#Running-on-the-Grid. Then come back and try to fetch your data. You might also need to "setup fife_utils" to get the program into your path (hopefully included in your standard nova setup script, but if not...)
Example: (fetched from bluearc) [anorman@novagpvm03 anorman]$ ifdh_fetch fardet_r00012006_s62_t02.raw found file on enstore, using dcache srm doing: ifdh cp "/pnfs/nova/rawdata/FarDet/000120/12006/fardet_r00012006_s62_t02.raw" "./$f" LOCK - Fri Feb 14 18:14:22 UTC 2014 LOCKS/LIMIT/QUEUE 5/5/281 sleeping 281 LOCK - Fri Feb 14 18:14:22 UTC 2014 queue 20140214.18:14:22.novagpvm03.15187.anorman.x509up_cp11537 LOCK - Fri Feb 14 18:14:31 UTC 2014 lock /grid/data/nova/LOCK/LOCKS/20140214.18:14:31.9.novagpvm03.15187.anorman.x509up_cp11537 LOCK - Fri Feb 14 18:14:42 UTC 2014 freed /grid/data/nova/LOCK/LOCKS/20140214.18:14:31.9.novagpvm03.15187.anorman.x509up_cp11537 [anorman@novagpvm03 anorman]$ ls -l *.raw -rw-r--r-- 1 anorman nova 518914708 Feb 14 12:14 fardet_r00012006_s62_t02.raw
Note that IFDH does the proper locking on copies from bluearc!
You can event fetch large numbers of files (here we'll grab those 64 files from stream 2 of run 12006):
[anorman@novagpvm03 anorman]$ ifdh_fetch --dims "data_tier = raw AND Online.RunNumber = 12006 AND data_stream 2 AND Online.Detector FarDet" found file on enstore, using dcache srm doing: ifdh cp "/pnfs/nova/rawdata/FarDet/000120/12006/fardet_r00012006_s04_t02.raw" "./$f" LOCK - Fri Feb 14 18:18:57 UTC 2014 LOCKS/LIMIT/QUEUE 5/5/253 sleeping 253 LOCK - Fri Feb 14 18:18:57 UTC 2014 queue 20140214.18:18:57.novagpvm03.15661.anorman.x509up_cp11537 ....
See the details of the IFDH package for other magic it can do.
All RAW data collected so far is available in /nova/data/rawdata/FarDet /nova/data/rawdata/NDOS All files are read-only and owned by novaraw. Eventually these will only exist on tape.
Accessing the Data Catalog from a web interface¶
Files can also be accessed from the web by using one of the data catalog web pages. These use the same technology behind the scenes but make it a bit easier to input things like dates and times.
After performing a search, you can click on a file or list of files (or options displayed with a file) to get more information (i.e. metadata, locations, children, parents etc...)
Raw to Root files¶
All files are converted from the *.raw format from DAQ into the ART-ified *.root files for Offline.
When a file is processed through this step a new file with the Data Tier of "artdaq" (i.e. art/root format) is created. The file has an additional fields such as the "DAQ2RawDigit.base_release" set to match the release it was converted under.
To find these files you can do the same queries to the data catalog with the inclusion of the right data tier and release information.
[anorman@novagpvm03 anorman]$ samweb -e nova list-files "data_tier = artdaq AND Online.RunNumber = 12006 and data_stream 2 AND Online.Detector = FarDet AND DAQ2RawDigit.base_release = S14-01-20 " fardet_r12006_s43_t02_cosmic_S14-01-20_v1_data.daq.root fardet_r12006_s21_t02_cosmic_S14-01-20_v1_data.daq.root fardet_r12006_s15_t02_cosmic_S14-01-20_v1_data.daq.root fardet_r12006_s54_t02_cosmic_S14-01-20_v1_data.daq.root fardet_r12006_s12_t02_cosmic_S14-01-20_v1_data.daq.root ....
Files can be located and fetched in the same manner as raw files:
[anorman@novagpvm03 anorman]$ samweb locate-file fardet_r12006_s15_t02_cosmic_S14-01-20_v1_data.daq.root novadata:/nova/data/novaroot/fardet/S14-01-20/02 novadata:/nova/data/novaroot/fardet/S14-01-20/000120/12006/02 enstore:/pnfs/nova/production/raw2root/fardet/S14-01-20/000120/12006/02(9147@vpe183)
[anorman@novagpvm03 anorman]$ ifdh_fetch fardet_r12006_s15_t02_cosmic_S14-01-20_v1_data.daq.root found file on enstore, using dcache srm doing: ifdh cp "/pnfs/nova/production/raw2root/fardet/S14-01-20/000120/12006/02/fardet_r12006_s15_t02_cosmic_S14-01-20_v1_data.daq.root" "./$f" Fri Feb 14 19:04:14 UTC 2014 LOCK - returned excess lock 6/5 LOCK - Fri Feb 14 19:04:14 UTC 2014 LOCKS/LIMIT/QUEUE 6/5/17 sleeping 17 LOCK - Fri Feb 14 19:04:14 UTC 2014 queue 20140214.19:04:14.novagpvm03.26888.anorman.x509up_cp11537 LOCK - Fri Feb 14 19:04:15 UTC 2014 lock /grid/data/nova/LOCK/LOCKS/20140214.19:04:15.1.novagpvm03.26888.anorman.x509up_cp11537 LOCK - Fri Feb 14 19:04:24 UTC 2014 freed /grid/data/nova/LOCK/LOCKS/20140214.19:04:15.1.novagpvm03.26888.anorman.x509up_cp11537 [anorman@novagpvm03 anorman]$ ls -l *.daq.root -rw-r--r-- 1 anorman nova 349482306 Feb 14 13:04 fardet_r12006_s15_t02_cosmic_S14-01-20_v1_data.daq.root
Raw files that have been processed through the DAQ2RawDigit They are located here: /nova/data/novaroot/FarDet/<RELEASE>/ /nova/data/novaroot/NDOS/<RELEASE>/ <RELEASE> denotes the software release used for the version of DAQ2RawDigit that is being used.
Within these directories files are split up by leading run number and then full run number followed by a directory for the trigger type.
Example: If it is run 11250 for FarDet then the directory paths are:
/nova/data/novaroot/FarDet/S13-09-04/000112/11250/numi (NuMI-trigger data) /nova/data/novaroot/FarDet/S13-09-04/000112/11250/cosmic (Cosmic-trigger data)
Reconstruction files and beyond¶
WARNING: DAR BE MONSTERS HERE! (this section is under construction and may not be up to date, like old maps of the flat world)
The reconstruction step is still maturing and is unofficial. These files were processed in tag S13-10-11.
The files live here:
Note that it is "v3" for CAF files. The previous two iterations were missing information.
Full Gain Cooled Data has been identified and Runs/Subruns selected during this period from run 11328 (start of 2 di-block cooling) and run 11406 (water leak after this run).
The Full Gain Cooled Data files are listed in the following file identified by hand by Nick:
A single hadd'ed CAF file exists for CAF analysis convenience for this period:
Note that this data has mangled channels in the FEB firmware.
The latest "best" list of Runs was produced when all the DCM's on diblocks 1 and 2 are running cold stably (except for a handful of warm APDs), with new firmware that fixed the channel mapping issue. There are no known time-sync or other issues with these runs.
Runs were identified by hand by Kanika and are listed in the following text file: TheGoodList.txt
A single hadd'ed CAF files exists that combines all subruns from these runs in this cold, stable, unmangled period.
A total of 2069 numi trigger files from runs 11496 through 11600 are included:
Event Scanning files¶
There exist filtered files that have passed a data selection criterion. See the README in the directory where the files reside:
A modified recoproductionjob.fcl and cafproductionjob.fcl were run with the following modules visited:
*cana (currently being optimised)
Special parameters were set for the "slicer" module:
And also the calibration was forced to use the MC calibration constants as we await data calibration constant:
- det_rxxxxxxxx_sxx_txx.raw - raw data in binary format
- det_rxxxxxxxx_sxx_txx_mask_release_vx.data.daq.root - raw data in root format
- det_rxxxxxxxx_sxx_txx_mask_release_vx.data.reco.root - reconstructed data
- det_rxxxxxxxx_sxx_txx_mask_release_vx.data.caf.root - caf files for reconstructed data
- det is a detector name (ndos,neardet or fardet)
- rxxxxxxxx is a unique run number (8 digits, e.g. r00011355)
- sxx is a subrun numer (2 digits, e.g. s01)
- txx denotes the trigger number (t followed by two digits, e.g. t02)
- vx represents the version of this file (starting from 1)
- the extensions denote what level of processing has been completed for this file
The trigger numbers correspond to:
- t00 - NuMI trigger (mask = numi)
- t01 - Booster trigger (mask = booster)
- t02 - cosmic trigger (mask = cosmic)
- blank - There is no stream name for files with a global trigger only -- everything from that run is written into this file.
<RELEASE> corresponds to the software release version that the files were processed within. Currently we are processing the raw->root step through S13-09-04.
Alongside the raw->root processing the Production group also produces files containing PCHitLists for the Calibration group.
These files live here:
and then subsequently follow the same directory schema as outlined above for processed data. The files are only processed through the cosmic stream. Three files exist for each subrun:
OnMon & Nearline Data¶
All of the files that are processed on the nearline machines (both the nearline-reco histogram files and the OnMon files) are copied over for storage to the FNAL machines.
These files live here:
You'll find more information about OnMon and the Nearline on the wiki for each:
Finding Data files at FNAL¶
ART allows you to provide a text file list of files to process and pass to the nova executable. There is a script which creates such list
You can copy this script to your directory and read instructions inside the script on how to get required files list.
We also create a keep-up list of goodruns that is stored here: