Project

General

Profile

SAM: Finding data files (and data-like files)

Log in and set up the software:

source /grid/fermiapp/lariat/setup_lariat.sh
setup lariatsoft v06_30_00 -q e10:prof
kx509

SAM is a database which keeps track of data files. (Also MC files.)

  • You can use SAM interactively, on the command line (below), to find out what is available to process in an analysis.
  • You can also use SAM in your batch jobs to process a group of files which match your requirements.
  • There's a python API: More here
    import samweb_client 
    print dir(samweb_client)
  • Searching the datasets online: http://samweb.fnal.gov:8480/sam/lariat/definition_editor/
    • From this site, you can filter all of the lariat runs by run number, current setting. etc. It also allows you to create a dataset using your conditions. However, some variables in the data may not be identical to what you search for. Example: the run number is saved as "Runs" within the dataset, but you would search for "run_number". Also some variables may not searchable. Some trial and error is necessary. But this provides a web-based search method instead of using terminal samweb commands used elsewhere on this page. Allows the use of "and", "or" and the wildcard "%".

SAM knows the file names and their locations, but also a long list of other important details: (partial list)

create_date                               Date the file was added to the database
data_stream                               Datastream name
data_tier                                 Data tier
dataset_def_id                            All files in snapshots made by this defintion ID
dataset_def_name                          All files in snapshots made by this defintion name
dataset_def_name_newest_snapshot          Files in the newest snapshot for this definition name
defname: <definition name>                Include the existing definition with this name
detector.cathode_voltage                  Parameter (true_float)
detector.collection_voltage               Parameter (true_float)
detector.induction_voltage                Parameter (true_float)
detector.pmt_etl                          Parameter (string)
detector.pmt_ham                          Parameter (string)
detector.shield_voltage                   Parameter (true_float)
detector.sipm_ham                         Parameter (string)
detector.sipm_sensl                       Parameter (string)
end_time                                  File end time
event_count                               Event count
fcl.name                                  Parameter (string)
fcl.version                               Parameter (string)
file_format                               File format
file_name                                 File name
file_size                                 File size in bytes
file_type                                 File type
filter.name                               Parameter (string)
first_event                               First event number
full_path                                 Full path of file location
isancestorof: ( <dimensions> )            Returns files that are an ancestor of any file matching the given sub-query
ischildof: ( <dimensions> )               Returns files that are the immediate child of any file matching the given sub-query
isdescendantof: ( <dimensions> )          Returns files that are an descendant of any file matching the given sub-query
isparentof: ( <dimensions> )              Returns files that are the immediate parent of any file matching the given sub-query
lariat_project.name                       Parameter (string)
lariat_project.stage                      Parameter (string)
lariat_project.version                    Parameter (string)
last_event                                Last event number
physical_datastream_name                  alias for data_stream
project_description                       Files projects matching this description have seen
project_id                                Files this project has seen
project_name                              Files this project has seen
run.period                                Parameter (string)
run_number                                Run number and optionally subrun number
run_type                                  Run type
secondary.intensity                       Parameter (true_float)
secondary.momentum                        Parameter (true_float)
secondary.polarity                        Parameter (string)
snapshot_id                               All files in this snapshot
start_time                                File start time
tertiary.beam_counters                    Parameter (string)
tertiary.cherenkov1                       Parameter (string)
tertiary.cherenkov2                       Parameter (string)
tertiary.cosmic_counters                  Parameter (string)
tertiary.DSTOF                            Parameter (string)
tertiary.halo_paddle                      Parameter (string)
tertiary.magnet_current                   Parameter (true_float)
tertiary.magnet_polarity                  Parameter (string)
tertiary.muon_range_stack                 Parameter (string)
tertiary.MWPC1                            Parameter (string)
tertiary.MWPC2                            Parameter (string)
tertiary.MWPC3                            Parameter (string)
tertiary.MWPC4                            Parameter (string)
tertiary.number_MuRS                      Parameter (true_int)
tertiary.punch_through                    Parameter (string)
tertiary.USTOF                            Parameter (string)

Additionally, there are about 400 more metadata dimensions from the DAQ configuration (by run) and the data-taking conditions (by sub-run).

Data_tier: Don't ignore this file parameter!

The SAM parameter data_tier is very important and should be included in every sam_web query or dataset.

data_tier usage meaning
raw No meaningful Events. Needs to be processed by the EventAssembler (aka, the Slicer). Everything from the same sub-run jumbled in one Event per raw file.
digits Ready to be reconstructed. Events contain the digitzed signals from all detectors and instruments which sent data in response to a the same trigger signal.

Here's the complete list of parameters, dimensions, conditions, and configuration elements:

http://samweb.fnal.gov:8480/sam/lariat/api/files/list/dimensions

...and the syntax rules for them: https://cdcvs.fnal.gov/redmine/projects/sam-web/wiki/Dimension_Syntax

Finding and getting data

DCache, Enstore, and the SAM Database

DCache on /pnfs/ for raw data is in /pnfs/lariat/raw/. These directories should be readable for any member of the lariat group. Locating spill files by their metadata (run_number, which systems were on, date of spill, etc) is done through the Lariat SAM database.

Getting samweb (command line tools) set up

ssh -XY lariatgpvm01-04.fnal.gov

source /grid/fermiapp/lariat/setup_lariat.sh
setup lariatsoft v06_24_00 -q e10:prof
kx509

samweb help-commands

Available commands:
  Data file commands:
    add-file-location
    count-files
    declare-file
    file-lineage
    get-file-access-url
    get-metadata
    list-files
    locate-file
    modify-metadata
    remove-file-location
    retire-file
    validate-metadata

  Definition commands:
    count-definition-files
    create-definition
    delete-definition
    describe-definition
    list-definition-files
    list-definitions
    modify-definition
    take-snapshot

  Project commands:
    find-project
    get-next-file
    list-projects
    prestage-dataset
    project-recovery
    project-summary
    release-file
    run-project
    set-process-file-status
    set-process-status
    start-process
    start-project
    stop-process
    stop-project

  Utility commands:
    file-checksum
    server-info

  Admin commands:
    add-application
    add-data-disk
    add-parameter
    add-user
    add-value
    describe-user
    list-applications
    list-data-disks
    list-parameters
    list-users
    list-values
    modify-user

What are all the native SAM parameters?

 samweb list-files --help-dimensions 

What are all the values a this parameter ever took?

--> Only works for native SAM parameters, not the full list of dimensions. (So not the DAQ configurations, and not the other beam/detector conditions)

 $ samweb list-parameters secondary.polarity
Positive
Unknown
Negative

Count (or list) just the files meeting some description:

 
$ samweb count-files "run_number > 7000 and run_type physics and tertiary.cherenkov1 On" 
358115
$ samweb list-files "run_number > 7000 and run_type physics and tertiary.cherenkov1 On" with limit 5
lariat_r007996_sr0001.root
lariat_r007996_sr0002.root
lariat_r007996_sr0003.root
lariat_r007996_sr0004.root
lariat_r007996_sr0005.root

With the file names you get, you can then do

 samweb locate-file lariat_rXXXXXXX_srYYYY.root 

The output contains a (bit you can ignore@which tape something), but also the file location.

Details about each file:

You can see some details about each file:

samweb get-metadata lariat_r006326_sr0024.root

                  File Name: lariat_r006326_sr0024.root
                    File Id: 718366
                Create Date: 2015-06-26T22:26:01+00:00
                       User: lariatraw
                Update Date: 2016-08-27T03:25:47+00:00
                Update User: randy
                  File Size: 30531400
                   Checksum: enstore:162237809
             Content Status: good
                  File Type: data
                File Format: artroot
                      Group: lariat
                  Data Tier: raw
                Event Count: 1
                First Event: 24
                 Last Event: 24
                 Start Time: 2015-06-26T22:23:37+00:00
                   End Time: 2015-06-26T22:24:16+00:00
   detector.cathode_voltage: 23554.0220023
detector.collection_voltage: 336.875
 detector.induction_voltage: -18.4375
           detector.pmt_etl: Off
           detector.pmt_ham: On
    detector.shield_voltage: -299.0625
          detector.sipm_ham: Unknown
        detector.sipm_sensl: Unknown
                 run.period: Run1
        secondary.intensity: 77069.0
         secondary.momentum: 64.0253829956
         secondary.polarity: Negative
     tertiary.beam_counters: On
        tertiary.cherenkov1: On
        tertiary.cherenkov2: Off
   tertiary.cosmic_counters: Unknown
             tertiary.DSTOF: On
       tertiary.halo_paddle: On
    tertiary.magnet_current: 99.8474121094
   tertiary.magnet_polarity: Negative
  tertiary.muon_range_stack: Partial
             tertiary.MWPC1: On
             tertiary.MWPC2: On
             tertiary.MWPC3: On
             tertiary.MWPC4: On
       tertiary.number_MuRS: 10
     tertiary.punch_through: On
             tertiary.USTOF: On
                       Runs: 6326.0024 (physics)

Tons more about specific runs and subruns/spill at the Run Summary Page

--> If you want to keep the list of files you get for use in batch jobs, you want to make a new dataset .

SAM datasets: A primer on Finding, Making, Describing, and Using them

SAM lets you create dataset definitions using the values of the metadata:

Making datasets and snapshots (for use by batch jobs)

What SAM already know about each sub-run

Each data file corresponds to one spill plus some cosmic-taking time, together called a "sub-run." Here is the complete list of things SAM knows about each file: http://samweb.fnal.gov:8480/sam/lariat/api/files/list/dimensions

  • Parameters - the metadata fields which are native to SAM. Their values can be listed with samweb. * Dimensions - all metadata fields accessible to SAM. LArIAT's DAQ config and Conditions databases provide dimensions which are not parameters. Such are the terms we're using here...

These can be used to make a dataset of any or all files matching a given description. The syntax is like this: https://cdcvs.fnal.gov/redmine/projects/sam-web/wiki/Dimension_Syntax.

Examples:

samweb create-definition GoodSecondaryBeamDigits "data_tier digits and lariat_end_f_mc7sc1 > 5000" 
samweb create-definition Good64GevPosSecondaryDigits "defname: GoodSecondaryBeamDigits and secondary.momentum > 60 and secondary.momentum < 68 and secondary.polarity Positive" 
samweb create-definition Good64GevNegSecondaryDigits "defname: GoodSecondaryBeamDigits and secondary.momentum > 60 and secondary.momentum < 68 and secondary.polarity Negative" 

samweb create-definition BothTOF_OnAndReadOutDigits "data_tier digits and tertiary.USTOF on and tertiary.DSTOF on and lariat_v1751_config_caen_enablereadout = 1" 
samweb create-definition AllMWPC_OnAndReadOutDigits "data_tier digits and tertiary.MWPC1 on and tertiary.MWPC2 on and tertiary.MWPC3 on and tertiary.MWPC4 on and lariat_tdc_config_tdc_enablereadout 1 and lariat_tdc_config_tdc_pulserenable = 0" 

samweb create-definition TPC_voltages_nominal "detector.cathode_voltage > 23000 and detector.collection_voltage > 320 and detector.collection_voltage < 350 and detector.induction_voltage < -10 and detector.induction_voltage > -20 and detector.shield_voltage < -290 and  detector.shield_voltage > -310" 
samweb create-definition TPC_nominal_read_out "lariat_larasic_config_larasic_enablereadout = 1 and lariat_larasic_config_larasic_pulseron = 0 and lariat_larasic_config_larasic_channelscan = 0" 
samweb create-definition TPC_MaxGainAndFilter "lariat_larasic_config_larasic_collection_filter = 3 and lariat_larasic_config_larasic_collection_gain = 3 and lariat_larasic_config_larasic_induction_filter = 3 and lariat_larasic_config_larasic_induction_gain = 3" 
samweb create-definition TPC_OnAndReadOutNominalDigits "data_tier digits and defname: TPC_voltages_nominal and defname: TPC_MaxGainAndFilter and defname: TPC_nominal_read_out" 

samweb create-definition end_of_april "data_tier digits and create_date > '2015-04-30T00:00:00-05:00' and create_date < '2015-04-30T23:59:00-05:00' " 

samweb create-definition ARTfile file_format artroot

Listing the datasets which exist already:

You can always list the dataset definitions:

  samweb list-definitions
BatchTestRun6326_10Events
BatchTestRun6326_200Events
BatchTestRun6326

Describing details of a dataset:

You can see how this dataset was defined:

 samweb describe-definition BatchTestRun6326_10Events
Definition Name: BatchTestRun6326_10Events
  Definition Id: 61
  Creation Date: 2015-07-10T18:04:26.302235+00:00
       Username: stjohn
          Group: lariat
     Dimensions: run_number 6326 and lariat_end_f_mc7sc1 > 2000 with limit 10

You can even see the files in each:

samweb list-definition-files BatchTestRun6326_10Events
lariat_r006326_sr0024.root
lariat_r006326_sr0026.root
lariat_r006326_sr0028.root
lariat_r006326_sr0032.root
lariat_r006326_sr0025.root
lariat_r006326_sr0030.root
lariat_r006326_sr0034.root
lariat_r006326_sr0036.root
lariat_r006326_sr0027.root
lariat_r006326_sr0029.root

Using SAM datasets in batch jobs:

Running grid jobs on SAM dataset explained: https://cdcvs.fnal.gov/redmine/projects/lardbt/wiki/Running_Grid_Jobs_using_projectpy

Having trouble?

Troubleshooting page for all batch submission and data processing.