THIS PAGE IS OUTDATED, DO NOT TRUST, IN DESPERATE NEED OF UPDATE¶
- Table of contents
- THIS PAGE IS OUTDATED, DO NOT TRUST, IN DESPERATE NEED OF UPDATE
- SAM and NOvA-ART
SAM and NOvA-ART¶
Intro¶
SAM (Sequential Access Metadata) is a Data Management and Delivery system. It catalogues experiment data based on experiment-specific metadata information. It enables remote data processing by delivering data on demand from archival storage to caches worldwide. SAM automatically manages caches. SAM closely integrates with Fermilab archive system (enstore) as well as with dCache, SRM, gridFTP It provides a client-server model with CLI, python client and a C++ client.
Setup SAM Tools¶
The SAM user tools are provides as a UPS product. The name of the package is "sam_web_client" and it is available from the common products area at FNAL. To set it up, first add the common products area to your UPS products path, then do a setup:
export PRODUCTS=/grid/fermiapp/products/common/db/:$PRODUCTS setup sam_web_client
This will give you access to the "samweb" tool. This tool can be run anywhere and communicates with SAM through the SAM web server (http access). By default the authentication mechanism that is used to communicate with the server is a kx509 certificate based authentication. The SAM server automatically populates the user list for the experiment by looking at all people whose kerberos credentials (kx509 certs) have Fermilab/nova set as their VO.
Description of Metadata parameters¶
Top-level¶
- file_type
importedSimulated: All simulated files
importedDetector: All data files
mc_config: FHiCL files for MC production
- file_format
root:
fcl:
- data_tier
raw:
fcl:
artdaq:
reconstruction:
pidpart:
lemsum:
lempart:
pid:
caf:
- start_time
- end_time
- data_stream
- event_count
- runs
[ run_number, run_type ]
Format: list of run number, run type pairs @[ [runNum1, "runType1"], [runNum2, "runType2"], ... ]#
Run type must be in the list of pre-defined values
physics: Data and realistic MC after the detector is commissioned
calibration: Special runs taken for calibration
commissioning: Data and realistic MC while detector is begin commissioned
- parents
All files used to produce this stage of processing
[ {"file_name": "<name>"} ]
where "name" base name of the parent. Multiple parents can be listed (objects separated by commas), and all parents must already be declared to SAM.
Release history¶
The release history of the processing chain is documented in several metadata parameters:
- NOVA.Release
The version of the most recently run processing step - this does not need to be set in the FHiCL file.
- FCL.Version
The version simulation FHiCL files were produced in.
- Calibration.base_release
DAQ2RawDigit.base_release
Reconstructed.base_release
Simulated.base_release
LEMPART.base_release
PIDPART.base_release
PID.base_release
These tell the history of various processing stages. The FHiCL responsible for that stage should set XX.base_release: "" in the metadata section. The empty quotes will automatically be filled in with the actual base release used at run time.
NOVA¶
- NOVA.Label
A greek letter indicating production run period (encapsulates production configuration and changes in metadata scheme)
- NOVA.Special
Special configurations (systematic studies, etc) use "none" if using a normal configuration
- NOVA.DetectorID
nd, fd, ndos
- NOVA.HornPolarity
fhc, rhc, off, none
- NOVA.HornConfig
string used in flux files
- NOVA.SubVersion
iteration number (distinguishes otherwise identical files)
FCL¶
- FCL.version
Stores the base release active when the .fcl file was made. Not called "base_release" as that name has special runtime behavior.
- FCL.data
- FCL.time
Simulated¶
- Simulated.base_release
- Simulated.firstRun
- Simulated.firstSubRun
- Simulated.number_of_spills
- Simulated.geometry
name of the geometry file
- Simulated.genie
Whether or not genie singles or pileup are in this file: true, false
- Simulated.cry
Whether or not cosmics singles or pileup are in this file: true, false
- Simulated.singlep
Whether or not single particles are in this file: true, false
- Simulated.volume
The volume the interactions are generated in: detector, rock, rock_detector
- Simulated.mixingType
Whether or not mixing is done
pileup: Varying numbers of interactions directly from flux
singles: Single interactions (neutrinos, cosmics or single particle)
overlay: Mixed from pileup or singles
- Genie.flavorset
none: Genie was not used
nonswap: Normal flux
swap:
tau:
- Cry.flavorset
none: Cry was not used
all: All allowed particles are created
gamma: Only produce photons
hadron: Only produce hadrons
- Singlep.flavorset
none: Singlep was not used
<particle type>: Fill in the particle generated
Useful SAM Commands¶
Tools for Using SAM¶
Metadata module¶
The metadata module takes metadata declared inside the job fcl file and/or from input art files and adds it to the output file's internal sqlite database.
To make sure the sqlite is written to the output file, the nova command must be called with the following command line options:
--sam-application-family=nova #always nova --sam-application-version=$SRT_BASE_RELEASE #picks up the actual release used --sam-file-type=importedSimulated #importedSimulated for MC/importedDetector for data --sam-data-tier=out1:artdaq #out1 is the label of the output module. --sam-stream-name=out1:all #all should be used for MC. Data uses an integer for different streams
Extracting metadata from FHiCL Files¶
Running on the Grid¶
Copying specific files from SAM without a Dataset¶
Running a Local Test Job on a Dataset¶
Metadata for Raw data files¶
The metadata that is stored for a raw file is generated by using the MetaDataRunTool which is a stand alone application that runs within the DAQ environment. The application is part of the MetaDataTools package.
Sample file:
File Name fardet_r00011116_s05.raw File Id 3949511 File Type importedDetector File Format raw File Size 540603688 Crc 1660280520 (adler 32 crc type) Content Status good Group nova Data Tier raw Application online datalogger 33 Event Count 7464 First Event 38428 Last Event 45892 Start Time 2013-09-09T03:05:36 End Time 2013-09-09T03:08:39 Data Stream all Online.ConfigIDX 0 Online.DataLoggerID 1 Online.DataLoggerVersion 33 Online.Detector fardet Online.DetectorID 2 Online.Partition 1 Online.RunControlID 0 Online.RunControlVersion 0 Online.RunEndTime 1378696119 Online.RunNumber 11116 Online.RunSize 135150922 Online.RunStartTime 1378694998 Online.RunType 0 Online.Stream all Online.SubRunEndTime 1378696119 Online.SubRunStartTime 1378695936 Online.Subrun 5 Online.TotalEvents 7464 Online.TriggerCtrlID 0 Online.TriggerListIDX 0 Online.TriggerPrescaleListIDX 0 Online.TriggerVersion 0 Online.ValidTriggerTypesHigh 0 Online.ValidTriggerTypesHigh2 0 Online.ValidTriggerTypesLow 0 Runs 11116.0005 (online) File Partition 5
Metadata for Reconstructed File¶
Sample file: /nova/data/novaroot/NDOS/S12.02.14/000131/13121/cosmic/reco_r00013121_s05_t02_cosmic_S12.02.14.root
file_name : "reco_r00013121_s05_t02_cosmic_S12.02.14.root"
file_types : "importedDetector"
file_formats : "root"
file_size : "`cat /nova/data/novaroot/NDOS/S12.02.14/000131/13121/cosmic/reco_r00013121_s05_t02_cosmic_S12.02.14.root | wc -c`"
crc : "`samweb file-checksum /nova/data/novaroot/NDOS/S12.02.14/000131/13121/cosmic/reco_r00013121_s05_t02_cosmic_S12.02.14.root`"
data_tiers : "artdaq"
Online.RunNumber : 13121
Online.RunStartTime : 1321236602
Online.RunEndTime : 1321258215
Online.SubRun : 5
Online.SubRunEndTime : 1321258215
Online.SubRunStartTime : 1321254609
Online.Stream : 2
Reconstructed.base-release : "S12.02.14"
NOVA.DetectorID : "ndos"
NOVA.HornConfig : "none"
NOVA.HornPolarity : "none"
NOVA.SubVersion : 1
NOVA.Label : "ndos_cosmic_reconstructed_data_S12.02.14"
parents : "ndos_r00013121_s05_t02.raw"
Datasets¶
We need to define a SAM dataset or use one that has already been defined
The Production group will provide standard datasets for use in future, but in practice users can also define their own.
The datasets are defined using the Definition Editor http://samweb.fnal.gov:8480/sam/nova/definition_editor/
More information for how to define them can be found here: SAM datasets
The menus don’t work at present but in future one can select definitions based on the metadata information, such as run period, horn polarity etc
So for now we have to manually define a dataset into the Data Set Definition box and hit the Submit Dataset Query button:
data_tier simulated and version 'S12-11-16' and simulated.generator cosmics and simulated.detectorID fd
One can now see all files that pass this definition. We can now save the dataset with an appropriate name, also filling in our username (kerberos principal) and group (nova). Or one can use a predefined dataset. For this query we saved the dataset 'cosmics-mc-S12-11-16-fd'.
FHiCL requirements¶
Another acronym for your repertoire: IFDH – Intensity Frontier Data Handling
https://cdcvs.fnal.gov/redmine/projects/ifdh-art/wiki
The ifdh_art package provides ART service access to the libraries from the ifdhc package (IFDH Client tools)
We need to mention we’re using these services in the job fcl that we would like to run.
NB. We will need to add these to a global .fcl, such as services.fcl eventually
One-liners for the fcl:
# if using ART with SAM, you need these entries, OR the --sam-* command-line options user.services.IFDH: {IFDH_BASE_URI: "http://samweb.fnal.gov:8480/sam/minerva/api"} user.services.CatalogInterface: { service_provider: "IFCatalogInterface" } user.services.FileTransfer: { service_provider: "IFFileTransfer" } outputs.out1.dataTier: "raw" process_name: "whatever" services.FileCatalogMetadata: { applicationFamily: "demo" applicationVersion: "1" fileType: "importedDetector" } # if you use either of IFBeam or nucondb services, include the respective # entry below. user.services.IFBeam: {} user.services.nucondb: {}
A working example¶
First we need to setup jobsub which one should have defined as a function in their ~\.profile file.
function setup_jobsub { export GROUP=nova export USER=<kerberos principal here> source /grid/fermiapp/products/nova/etc/setups.sh setup jobsub_tools }
Then also setup the CVS-controlled nova software. We have datasets for tag S12-11-16 in the CVS-era and jobsub only recognises this era of software. This will be updated as a priority
function setup_nova_cvs { source /grid/fermiapp/nova/novaart/novasoft/srt/srt.sh export EXTERNALS=/nusoft/app/externals source $SRT_DIST/setup/setup_novasoft.sh "$@" }
And we can then setup the ifdh_art package and run our jobsub command.
The exact commands, using the profile functions, are:
setup_jobsub setup_nova_cvs -r S12-11-16 PRODUCTS=$EXTERNALS:$PRODUCTS setup ifdh_art v1_0_rc1 -q debug:e2:nu jobsub -g \ -r S12-12-12 \ -N 100 \ --dataset_definition=cosmics-mc-S12-11-16-fd \ $IFDH_ART_DIR/bin/art_sam_wrap.sh \ -X nova \ --dest /nova/data/users/gsdavies \ --rename uniq \ --limit 2 \ -c /nova/app/users/anorman/NOVA-OFFLINE/cosmictrackjob.fcl
or easier to copy:
jobsub -g -r S12-12-12 -N 100 --dataset_definition=cosmics-mc-S12-11-16-fd $IFDH_ART_DIR/bin/art_sam_wrap.sh -X nova --dest /nova/data/users/gsdavies --rename uniq --limit 2 -c /nova/app/users/anorman/NOVA-OFFLINE/cosmictrackjob.fcl
One can follow the submitted projects here: http://samweb.fnal.gov:8480/station_monitor/nova/stations/nova/projects
An example of a running project is included