SAM web cookbook » History » Version 13
« Previous -
Version 13/14
(diff) -
Next » -
Current version
Erica Smith, 11/07/2017 10:32 AM
SAM Web Cookbook (NOvA Edition)¶
Right now the sam_web_client is setup by default when one does setup_nova. The experiment name ("nova") is also set for you.
If you don't have a certificate, get one (based off your kerberos credentials):
kx509
You could automate this by putting it in your login file (i.e. .bash_profile or .bashrc)
- Table of contents
- SAM Web Cookbook (NOvA Edition)
- Glossary and Conventions
- Beginner Recipes (boiling water)
- Intermediate Recipes (Poaching eggs)
- Get a list of all currently defined fields
- Get a list of Non-DAQ data files (e.g. Laser Scans) matching a search
- Listing Files with children matching a selection
- Listing Files that match a filename patern
- Listing Files with parents matching a selection
- Listing Files with no physical locations
- Listing Files with physical locations
- Retrieving Files with a physical location
- Verifying that your file was transfer correctly
- Finding projects run off of a dataset
- Advanced Recipes (Hollandaise sauce)
- Combining it all (Eggs Benedict)
Glossary and Conventions¶
The following shorthands are used:
In the following, anything with a set of angled brackets denotes a variable. i.e. <run number> would be insert your own personal run number you were interested in.
Anything with a dollar sign in front of it denotes a shell variable, i.e. $BASE_QUERY
- BASE_QUERY is the data tier and detector. It is assumed to be set like:
export BASE_QUERY="data_tier raw AND online.detector fardet"
To get help from samweb type:
samweb --help-commands
Beginner Recipes (boiling water)¶
export BASE_QUERY="data_tier raw AND online.detector fardet"
To save some typing.
Queries are NOT case SeNsItIvE
To save typing some parts of earlier queries are denoted as $QUERY_XXX
Any where you see a "list-files" you can replace it with a "count-files" to just return a count instead of actual file names.
List Files from a data tier and detector¶
*samweb list-files "data_tier <tier> AND online.detector <det>"
Example:
samweb list-files "data_tier raw AND online.detector fardet"
Will return 459,532 files (today)
samweb list-files "data_tier raw AND online.detector ndos"
Will return 91,360 files (today)
From here on we use $BASE_QUERY for this.
List Files from a Run¶
samweb list-files "$BASE_QUERY and online.runumber <runnumber>"
Or:
samweb list-files "$BASE_QUERY and run_number <runnumber>"
First one is DAQ specific, the other is more general.
Example:
samweb list-files "$BASE_QUERY and run_number 13114"
List Files from a Time Period¶
Files created between two times:
samweb list-files "$BASE_QUERY and start_time > '2014-01-30T23:29:00' and start_time < '2014-01-31T00:30:00'"
Example:
samweb list-files "$BASE_QUERY and start_time > '2014-01-30T23:29:00' and start_time < '2014-01-31T00:30:00'"
List Files from a specific trigger stream¶
You want only a given stream.
samweb list-files "$BASE_QUERY and run_number <run_no> and data_stream <stream>"
or
samweb list-files "$BASE_QUERY and run_number <run_no> and Online.Stream <stream>"
For DAQ files only.
Stream is a number. Streams are fully configurable, but in general in early 2014 they looked like:
- 0 = NuMI
- 1 = Booster Beam
- 2 = Min Bias
samweb list-files "$BASE_QUERY and run_number 13114 and data_stream 0
List Files from DAQ Partition¶
You want only a specific DAQ Partition
samweb list-file "$BASE_QUERY and Online.Partition <partno>"
*samweb list-file "$BASE_QUERY and Online.Partition 1"*
List Metadata associated with a file:¶
File names do not have paths, just base names (all files in SAM are unique)
samweb get-metadata <filename>
samweb get-metadata fardet_r00013114_s20_t00.raw
You get output like:
File Name: fardet_r00013114_s20_t00.raw File Id: 4877797 File Type: importedDetector File Format: raw File Size: 6908296 Crc: 74650857 (adler 32 crc type) Content Status: good Group: nova Data Tier: raw Application: online datalogger 33 Event Count: 110 First Event: 171026 Last Event: 179507 Start Time: 2014-02-14T01:34:14 End Time: 2014-02-14T01:37:43 Data Stream: 0 Online.ConfigIDX: 0 Online.DataLoggerID: 1 Online.DataLoggerVersion: 33 Online.Detector: fardet Online.DetectorID: 2 Online.Partition: 1 Online.RunControlID: 0 Online.RunControlVersion: 0 Online.RunEndTime: 1392341863 Online.RunNumber: 13114 Online.RunSize: 1727074 Online.RunStartTime: 1392337488 Online.RunType: 0 Online.Stream: 0 Online.SubRunEndTime: 1392341863 Online.SubRunStartTime: 1392341654 Online.Subrun: 20 Online.TotalEvents: 110 Online.TriggerCtrlID: 0 Online.TriggerListIDX: 0 Online.TriggerPrescaleListIDX: 0 Online.TriggerVersion: 0 Online.ValidTriggerTypesHigh: 0 Online.ValidTriggerTypesHigh2: 0 Online.ValidTriggerTypesLow: 0 Runs: 13114.0020 (online) File Partition: 20
List files with some other parameter or parameters¶
samweb list-file "$BASE_QUERY and Parameter.name_1 <value> and Parameter.name_2 <value>"
*samweb list-file "$BASE_QUERY and Online.TotalEvents > 123 and Online.DataLoggerVersion = 33"*
Get File locations¶
samweb locate-file <filename>
samweb locate-file ndos_r00015701_s07_cosmic.raw
Response will be a list of locations:
novadata:/nova/data/rawdata/NDOS/000157/15701/cosmic enstore:/pnfs/nova/rawdata/NDOS/runs/000157/15701(1548@vpe048)
- Locations starting with "novadata" are bluearc central disk.
- Locations starting with "enstore" are dCache/Enstore locations (disk cache, tape backed)
Get Ancestors of a File¶
samweb file-lineage <children/descendants> <filename>
Children are files derived directly from the input file
samweb file-lineage children fardet_r00013096_s14_t00.raw fardet_r00013096_s14_t00_numi_S14-01-20_v1_data.daq.root
samweb file-lineage <parents/ancenstors> <filename>
samweb file-lineage parents fardet_r00013096_s14_t00_numi_S14-01-20_v1_data.daq.root fardet_r00013096_s14_t00.raw
Intermediate Recipes (Poaching eggs)¶
Get a list of all currently defined fields¶
Go to:
Current Nova Experiment Dimensions
Get a list of Non-DAQ data files (e.g. Laser Scans) matching a search¶
samweb list-file "data_tier laser_scan AND laser_scan.block_number = 23 AND laser_scan.layer_number > 4"
Listing Files with children matching a selection¶
List raw files who have been processed through a different stage
samweb list-file "$BASE_QUERY and isparentof: (data_tier <stage> AND Parameter.name_1 <value>)"
samweb list-files "$BASE_QUERY and isparentof: ( data_tier artdaq AND daq2rawdigit.base_release 'S14-01-20' )"
Listing Files that match a filename patern¶
This is to match parts of the file name
samweb list-file "file_name like fardet%DDenergy%"
Listing Files with parents matching a selection¶
With BASE_QUERY2="data_tier artdaq AND online.detector fardet"
samweb list-file "$BASE_QUERY2 and ischildof: ( data_tier raw AND Online.Subrun < 20)
Listing Files with no physical locations¶
samweb list-files "$BASE_QUERY AND availability: virtual"
samweb list-files "$BASE_QUERY AND availability: virtual"
Listing Files with physical locations¶
samweb list-files "$BASE_QUERY AND availability: physical"
samweb list-files "$BASE_QUERY AND availability: physical"
Retrieving Files with a physical location¶
You can retrieve files either individually or with a query pattern (multiple files).
Retrieve a single file¶
ifdh_fetch <filename>
ifdh_fetch fardet_r00012006_s61_t02.raw
Note: you must have a valid certificate (i.e. run kx509)
Retrieve a group of files¶
ifdh_fetch `ifdh translateContraints <dimensions string>`
ifdh_fetch `ifdh translateConstraints "data_tier raw AND online.detector fardet and run_number 12006.51"`
Note: Here ifdh is used to do the lookup of the files and then the resulting names are passed to the fetch.
Verifying that your file was transfer correctly¶
Check the checksum against the tape copy (no json parser installed)
# From Database samweb get-metadata fardet_r00012006_s35_t02.raw | grep "Crc" | cut -d ':' -f 2 | cut -d ' ' -f 2 3828307205 # From file on disk samweb file-checksum fardet_r00012006_s35_t02.raw | cut -d '"' -f 4 3828307205
If you have a json parser available then just use that to parse the output instead of using "cut"
samweb get-metadata fardet_r00012006_s35_t02.raw --json | jq '.crc.crc_value' "3828307205"
Finding projects run off of a dataset¶
If you need to determine who ran projects off a dataset, you can use:
samweb list-projects --defname=<defname>
This lists all projects run off a dataset. Most project names start with the username of the person who created them, so generally, no further work is necessary. If a project is listed whose creator is not obvious, you can use:
samweb project-summary <project name> | less
The first few lines of the output will tell you who the project creator was.
Advanced Recipes (Hollandaise sauce)¶
Recovering a whole project¶
samweb project-recovery -e nova --useFileStatus=0 --useProcessStatus=0 gsdavies-RecoFDGENIE_S14-03-25.sh-20140418_1037
which yields:
(snapshot_id 15312 minus (project_name gsdavies-RecoFDGENIE_S14-03-25.sh-20140418_1037 and consumed_status consumed))
kx509 samweb create-definition <new_definition_name> "(snapshot_id 15312 minus (project_name gsdavies-RecoFDGENIE_S14-03-25.sh-20140418_1037 and consumed_status consumed))"
Re-running only failed ("skipped") files from an existing project¶
samweb create-definition <project>_recovery "project_name <project> and consumed_status skipped"
Sampling/Prescaling a dataset¶
SAM provides a mechanism for deterministically sampling a dataset. To do this:
- Define the dataset
- Define a new dataset with a stride and offset
samweb create-definition my_dataset "<selection critera>" samweb create-definition my_dataset_oneTenth_version1 "defname: my_dataset with stride 10 offset 0" samweb create-definition my_dataset_oneTenth_version2 "defname: my_dataset with stride 10 offset 1" ...
Each of these will create a dataset that is 1/10 the size of the original. The offset parameter specifies where to start counting from (so offset 0 starts from the first element in the list, offset 1 starts from the second).
Adding additional constraints to a dataset¶
If a dataset already exists which contains all the files you want, but you want to add an additional constraint (for instance, you only want a specific run range) you can:
samweb create-definition new_dataset "defname: old_dataset and run_number >= startRun and run_number <= endRun"
Constructing a Good Runs List¶
Pre-staging Data from Tape¶
If you need to prestage data from tape the way to do it is the following:
- Start a screen (or tmux) session
- Do a "prestage-dataset" with your dataset name
- Detach from the screen session
- Come back in a few hours or days...
- The command supports running parallel processes with
--parallel
. Recommended best practice is to do 1 dataset at a time with 4 parallel threads.
screen samweb prestage-dataset --defname=fardet_onehertz_raw_Oct2014-May2015-10percent --parallel 4
For more information on this process see the How To Configure Production Jobs page on the production wiki.