Project

General

Profile

SAM Web Cookbook (NOvA Edition)

Right now the sam_web_client is setup by default when one does setup_nova. The experiment name ("nova") is also set for you.

If you don't have a certificate, get one (based off your kerberos credentials):

kx509

You could automate this by putting it in your login file (i.e. .bash_profile or .bashrc)

Glossary and Conventions

The following shorthands are used:

In the following, anything with a set of angled brackets denotes a variable. i.e. <run number> would be insert your own personal run number you were interested in.

Anything with a dollar sign in front of it denotes a shell variable, i.e. $BASE_QUERY

  • BASE_QUERY is the data tier and detector. It is assumed to be set like:
export BASE_QUERY="data_tier raw AND online.detector fardet" 

To get help from samweb type:

samweb --help-commands

Beginner Recipes (boiling water)

export BASE_QUERY="data_tier raw AND online.detector fardet" 

To save some typing.

Queries are NOT case SeNsItIvE

To save typing some parts of earlier queries are denoted as $QUERY_XXX

Any where you see a "list-files" you can replace it with a "count-files" to just return a count instead of actual file names.

List Files from a data tier and detector

*samweb list-files "data_tier <tier> AND online.detector <det>"

Example:

samweb list-files "data_tier raw AND online.detector fardet" 

Will return 459,532 files (today)

samweb list-files "data_tier raw AND online.detector ndos" 

Will return 91,360 files (today)

From here on we use $BASE_QUERY for this.

List Files from a Run

samweb list-files "$BASE_QUERY and online.runumber <runnumber>"

Or:

samweb list-files "$BASE_QUERY and run_number <runnumber>"

First one is DAQ specific, the other is more general.

Example:

samweb list-files "$BASE_QUERY and run_number 13114" 

List Files from a Time Period

Files created between two times:

samweb list-files "$BASE_QUERY and start_time > '2014-01-30T23:29:00' and start_time < '2014-01-31T00:30:00'"

Example:

samweb list-files "$BASE_QUERY and start_time > '2014-01-30T23:29:00' and start_time < '2014-01-31T00:30:00'" 

List Files from a specific trigger stream

You want only a given stream.

samweb list-files "$BASE_QUERY and run_number <run_no> and data_stream <stream>"

or

samweb list-files "$BASE_QUERY and run_number <run_no> and Online.Stream <stream>"

For DAQ files only.

Stream is a number. Streams are fully configurable, but in general in early 2014 they looked like:

  • 0 = NuMI
  • 1 = Booster Beam
  • 2 = Min Bias
As of Feb 2019, the streams are:
  • There is no stream name for files with a global trigger only -- everything from that run is written into this file.
  • 0 = NuMI trigger.
  • 1 = Booster trigger.
  • 2 = cosmic trigger.
  • 4 = calibration mode.
samweb list-files "$BASE_QUERY and run_number 13114 and data_stream 0

List Files from DAQ Partition

You want only a specific DAQ Partition

samweb list-file "$BASE_QUERY and Online.Partition <partno>"

*samweb list-file "$BASE_QUERY and Online.Partition 1"*

List Metadata associated with a file:

File names do not have paths, just base names (all files in SAM are unique)

samweb get-metadata <filename>

samweb get-metadata fardet_r00013114_s20_t00.raw

You get output like:

                    File Name: fardet_r00013114_s20_t00.raw
                      File Id: 4877797
                    File Type: importedDetector
                  File Format: raw
                    File Size: 6908296
                          Crc: 74650857 (adler 32 crc type)
               Content Status: good
                        Group: nova
                    Data Tier: raw
                  Application: online datalogger 33
                  Event Count: 110
                  First Event: 171026
                   Last Event: 179507
                   Start Time: 2014-02-14T01:34:14
                     End Time: 2014-02-14T01:37:43
                  Data Stream: 0
             Online.ConfigIDX: 0
          Online.DataLoggerID: 1
     Online.DataLoggerVersion: 33
              Online.Detector: fardet
            Online.DetectorID: 2
             Online.Partition: 1
          Online.RunControlID: 0
     Online.RunControlVersion: 0
            Online.RunEndTime: 1392341863
             Online.RunNumber: 13114
               Online.RunSize: 1727074
          Online.RunStartTime: 1392337488
               Online.RunType: 0
                Online.Stream: 0
         Online.SubRunEndTime: 1392341863
       Online.SubRunStartTime: 1392341654
                Online.Subrun: 20
           Online.TotalEvents: 110
         Online.TriggerCtrlID: 0
        Online.TriggerListIDX: 0
Online.TriggerPrescaleListIDX: 0
        Online.TriggerVersion: 0
 Online.ValidTriggerTypesHigh: 0
Online.ValidTriggerTypesHigh2: 0
  Online.ValidTriggerTypesLow: 0
                         Runs: 13114.0020 (online)
               File Partition: 20

List files with some other parameter or parameters

samweb list-file "$BASE_QUERY and Parameter.name_1 <value> and Parameter.name_2 <value>"

*samweb list-file "$BASE_QUERY and Online.TotalEvents > 123 and Online.DataLoggerVersion = 33"*

Get File locations

samweb locate-file <filename>

samweb locate-file ndos_r00015701_s07_cosmic.raw

Response will be a list of locations:
novadata:/nova/data/rawdata/NDOS/000157/15701/cosmic
enstore:/pnfs/nova/rawdata/NDOS/runs/000157/15701(1548@vpe048)
  • Locations starting with "novadata" are bluearc central disk.
  • Locations starting with "enstore" are dCache/Enstore locations (disk cache, tape backed)

Get Ancestors of a File

samweb file-lineage <children/descendants> <filename>

Children are files derived directly from the input file

samweb file-lineage children fardet_r00013096_s14_t00.raw
fardet_r00013096_s14_t00_numi_S14-01-20_v1_data.daq.root

samweb file-lineage <parents/ancenstors> <filename>

samweb file-lineage parents fardet_r00013096_s14_t00_numi_S14-01-20_v1_data.daq.root
fardet_r00013096_s14_t00.raw

Intermediate Recipes (Poaching eggs)

Get a list of all currently defined fields

Go to:
Current Nova Experiment Dimensions

Get a list of Non-DAQ data files (e.g. Laser Scans) matching a search

samweb list-file "data_tier laser_scan AND laser_scan.block_number = 23 AND laser_scan.layer_number > 4"

Listing Files with children matching a selection

List raw files who have been processed through a different stage

samweb list-file "$BASE_QUERY and isparentof: (data_tier <stage> AND Parameter.name_1 <value>)"

samweb list-files "$BASE_QUERY and isparentof: ( data_tier artdaq AND daq2rawdigit.base_release 'S14-01-20' )" 

Listing Files that match a filename patern

This is to match parts of the file name

samweb list-file "file_name like fardet%DDenergy%" 

Listing Files with parents matching a selection

With BASE_QUERY2="data_tier artdaq AND online.detector fardet"

samweb list-file "$BASE_QUERY2 and ischildof: ( data_tier raw AND Online.Subrun < 20)

Listing Files with no physical locations

samweb list-files "$BASE_QUERY AND availability: virtual"

samweb list-files "$BASE_QUERY AND availability: virtual" 

Listing Files with physical locations

samweb list-files "$BASE_QUERY AND availability: physical"

samweb list-files "$BASE_QUERY AND availability: physical" 

Retrieving Files with a physical location

You can retrieve files either individually or with a query pattern (multiple files).

Retrieve a single file

ifdh_fetch <filename>

ifdh_fetch fardet_r00012006_s61_t02.raw

Note: you must have a valid certificate (i.e. run kx509)

Retrieve a group of files

ifdh_fetch `ifdh translateContraints <dimensions string>`

ifdh_fetch `ifdh translateConstraints "data_tier raw AND online.detector fardet and run_number 12006.51"`

Note: Here ifdh is used to do the lookup of the files and then the resulting names are passed to the fetch.

Verifying that your file was transfer correctly

Check the checksum against the tape copy (no json parser installed)

# From Database
samweb get-metadata fardet_r00012006_s35_t02.raw | grep "Crc" | cut -d ':' -f 2 | cut -d ' ' -f 2
3828307205
# From file on disk
samweb file-checksum fardet_r00012006_s35_t02.raw | cut -d '"' -f 4
3828307205

If you have a json parser available then just use that to parse the output instead of using "cut"

samweb get-metadata fardet_r00012006_s35_t02.raw --json | jq '.crc.crc_value'
"3828307205" 

Finding projects run off of a dataset

If you need to determine who ran projects off a dataset, you can use:

samweb list-projects --defname=<defname>

This lists all projects run off a dataset. Most project names start with the username of the person who created them, so generally, no further work is necessary. If a project is listed whose creator is not obvious, you can use:

samweb project-summary <project name> | less

The first few lines of the output will tell you who the project creator was.

Advanced Recipes (Hollandaise sauce)

Recovering a whole project

samweb project-recovery -e nova --useFileStatus=0 --useProcessStatus=0 gsdavies-RecoFDGENIE_S14-03-25.sh-20140418_1037

which yields:

(snapshot_id 15312 minus (project_name gsdavies-RecoFDGENIE_S14-03-25.sh-20140418_1037 and consumed_status consumed))

kx509
samweb create-definition <new_definition_name> "(snapshot_id 15312 minus (project_name gsdavies-RecoFDGENIE_S14-03-25.sh-20140418_1037 and consumed_status consumed))" 

Re-running only failed ("skipped") files from an existing project

samweb create-definition <project>_recovery "project_name <project> and consumed_status skipped" 

Sampling/Prescaling a dataset

SAM provides a mechanism for deterministically sampling a dataset. To do this:

  • Define the dataset
  • Define a new dataset with a stride and offset
samweb create-definition my_dataset "<selection critera>" 
samweb create-definition my_dataset_oneTenth_version1 "defname: my_dataset with stride 10 offset 0" 
samweb create-definition my_dataset_oneTenth_version2 "defname: my_dataset with stride 10 offset 1" 
...

Each of these will create a dataset that is 1/10 the size of the original. The offset parameter specifies where to start counting from (so offset 0 starts from the first element in the list, offset 1 starts from the second).

Adding additional constraints to a dataset

If a dataset already exists which contains all the files you want, but you want to add an additional constraint (for instance, you only want a specific run range) you can:

samweb create-definition new_dataset "defname: old_dataset and run_number >= startRun and run_number <= endRun" 

Constructing a Good Runs List

Pre-staging Data from Tape

If you need to prestage data from tape the way to do it is the following:

  • Start a screen (or tmux) session
  • Do a "prestage-dataset" with your dataset name
  • Detach from the screen session
  • Come back in a few hours or days...
  • The command supports running parallel processes with --parallel. Recommended best practice is to do 1 dataset at a time with 4 parallel threads.
screen
samweb prestage-dataset --defname=fardet_onehertz_raw_Oct2014-May2015-10percent --parallel 4

For more information on this process see the How To Configure Production Jobs page on the production wiki.

Combining it all (Eggs Benedict)