Project

General

Profile

THIS PAGE IS OUTDATED, DO NOT TRUST, IN DESPERATE NEED OF UPDATE

SAM and NOvA-ART

Intro

SAM (Sequential Access Metadata) is a Data Management and Delivery system. It catalogues experiment data based on experiment-specific metadata information. It enables remote data processing by delivering data on demand from archival storage to caches worldwide. SAM automatically manages caches. SAM closely integrates with Fermilab archive system (enstore) as well as with dCache, SRM, gridFTP It provides a client-server model with CLI, python client and a C++ client.

Setup SAM Tools

The SAM user tools are provides as a UPS product. The name of the package is "sam_web_client" and it is available from the common products area at FNAL. To set it up, first add the common products area to your UPS products path, then do a setup:

export PRODUCTS=/grid/fermiapp/products/common/db/:$PRODUCTS
setup sam_web_client

This will give you access to the "samweb" tool. This tool can be run anywhere and communicates with SAM through the SAM web server (http access). By default the authentication mechanism that is used to communicate with the server is a kx509 certificate based authentication. The SAM server automatically populates the user list for the experiment by looking at all people whose kerberos credentials (kx509 certs) have Fermilab/nova set as their VO.

Description of Metadata parameters

Top-level

  • file_type

importedSimulated: All simulated files
importedDetector: All data files
mc_config: FHiCL files for MC production

  • file_format

root:
fcl:

  • data_tier

raw:
fcl:
artdaq:
reconstruction:
pidpart:
lemsum:
lempart:
pid:
caf:

  • start_time
  • end_time
  • data_stream
  • event_count
  • runs
    [ run_number, run_type ]

Format: list of run number, run type pairs @[ [runNum1, "runType1"], [runNum2, "runType2"], ... ]#
Run type must be in the list of pre-defined values
physics: Data and realistic MC after the detector is commissioned
calibration: Special runs taken for calibration
commissioning: Data and realistic MC while detector is begin commissioned

  • parents

All files used to produce this stage of processing [ {"file_name": "<name>"} ] where "name" base name of the parent. Multiple parents can be listed (objects separated by commas), and all parents must already be declared to SAM.

Release history

The release history of the processing chain is documented in several metadata parameters:

  • NOVA.Release

The version of the most recently run processing step - this does not need to be set in the FHiCL file.

  • FCL.Version

The version simulation FHiCL files were produced in.

  • Calibration.base_release
    DAQ2RawDigit.base_release
    Reconstructed.base_release
    Simulated.base_release
    LEMPART.base_release
    PIDPART.base_release
    PID.base_release

These tell the history of various processing stages. The FHiCL responsible for that stage should set XX.base_release: "" in the metadata section. The empty quotes will automatically be filled in with the actual base release used at run time.

NOVA

  • NOVA.Label

A greek letter indicating production run period (encapsulates production configuration and changes in metadata scheme)

  • NOVA.Special

Special configurations (systematic studies, etc) use "none" if using a normal configuration

  • NOVA.DetectorID

nd, fd, ndos

  • NOVA.HornPolarity

fhc, rhc, off, none

  • NOVA.HornConfig

string used in flux files

  • NOVA.SubVersion

iteration number (distinguishes otherwise identical files)

FCL

  • FCL.version

Stores the base release active when the .fcl file was made. Not called "base_release" as that name has special runtime behavior.

  • FCL.data
  • FCL.time

Simulated

  • Simulated.base_release
  • Simulated.firstRun
  • Simulated.firstSubRun
  • Simulated.number_of_spills
  • Simulated.geometry

name of the geometry file

  • Simulated.genie

Whether or not genie singles or pileup are in this file: true, false

  • Simulated.cry

Whether or not cosmics singles or pileup are in this file: true, false

  • Simulated.singlep

Whether or not single particles are in this file: true, false

  • Simulated.volume

The volume the interactions are generated in: detector, rock, rock_detector

  • Simulated.mixingType

Whether or not mixing is done
pileup: Varying numbers of interactions directly from flux
singles: Single interactions (neutrinos, cosmics or single particle)
overlay: Mixed from pileup or singles

  • Genie.flavorset

none: Genie was not used
nonswap: Normal flux
swap:
tau:

  • Cry.flavorset

none: Cry was not used
all: All allowed particles are created
gamma: Only produce photons
hadron: Only produce hadrons

  • Singlep.flavorset

none: Singlep was not used
<particle type>: Fill in the particle generated

Useful SAM Commands

Tools for Using SAM

Metadata module

The metadata module takes metadata declared inside the job fcl file and/or from input art files and adds it to the output file's internal sqlite database.

To make sure the sqlite is written to the output file, the nova command must be called with the following command line options:

--sam-application-family=nova                   #always nova
--sam-application-version=$SRT_BASE_RELEASE     #picks up the actual release used
--sam-file-type=importedSimulated               #importedSimulated for MC/importedDetector for data
--sam-data-tier=out1:artdaq                     #out1 is the label of the output module. 
--sam-stream-name=out1:all                      #all should be used for MC. Data uses an integer for different streams

Extracting metadata from FHiCL Files

Running on the Grid

Copying specific files from SAM without a Dataset

Running a Local Test Job on a Dataset

Metadata for Raw data files

The metadata that is stored for a raw file is generated by using the MetaDataRunTool which is a stand alone application that runs within the DAQ environment. The application is part of the MetaDataTools package.

Sample file:

File Name    fardet_r00011116_s05.raw
File Id    3949511
File Type    importedDetector
File Format    raw
File Size    540603688
Crc    1660280520 (adler 32 crc type)
Content Status    good
Group    nova
Data Tier    raw
Application    online datalogger 33
Event Count    7464
First Event    38428
Last Event    45892
Start Time    2013-09-09T03:05:36
End Time    2013-09-09T03:08:39
Data Stream    all
Online.ConfigIDX    0
Online.DataLoggerID    1
Online.DataLoggerVersion    33
Online.Detector    fardet
Online.DetectorID    2
Online.Partition    1
Online.RunControlID    0
Online.RunControlVersion    0
Online.RunEndTime    1378696119
Online.RunNumber    11116
Online.RunSize    135150922
Online.RunStartTime    1378694998
Online.RunType    0
Online.Stream    all
Online.SubRunEndTime    1378696119
Online.SubRunStartTime    1378695936
Online.Subrun    5
Online.TotalEvents    7464
Online.TriggerCtrlID    0
Online.TriggerListIDX    0
Online.TriggerPrescaleListIDX    0
Online.TriggerVersion    0
Online.ValidTriggerTypesHigh    0
Online.ValidTriggerTypesHigh2    0
Online.ValidTriggerTypesLow    0
Runs    
11116.0005 (online)
File Partition    5

Metadata for Reconstructed File

Sample file: /nova/data/novaroot/NDOS/S12.02.14/000131/13121/cosmic/reco_r00013121_s05_t02_cosmic_S12.02.14.root

  file_name                  : "reco_r00013121_s05_t02_cosmic_S12.02.14.root" 
  file_types                 : "importedDetector" 
  file_formats               : "root" 
  file_size                  : "`cat /nova/data/novaroot/NDOS/S12.02.14/000131/13121/cosmic/reco_r00013121_s05_t02_cosmic_S12.02.14.root | wc -c`" 
  crc                        : "`samweb file-checksum /nova/data/novaroot/NDOS/S12.02.14/000131/13121/cosmic/reco_r00013121_s05_t02_cosmic_S12.02.14.root`" 
  data_tiers                 : "artdaq" 
  Online.RunNumber           : 13121
  Online.RunStartTime         : 1321236602
  Online.RunEndTime         : 1321258215
  Online.SubRun              : 5
  Online.SubRunEndTime         : 1321258215
  Online.SubRunStartTime     : 1321254609
  Online.Stream              : 2
  Reconstructed.base-release : "S12.02.14" 
  NOVA.DetectorID            : "ndos" 
  NOVA.HornConfig            : "none" 
  NOVA.HornPolarity          : "none" 
  NOVA.SubVersion            : 1
  NOVA.Label                 : "ndos_cosmic_reconstructed_data_S12.02.14" 
  parents                    : "ndos_r00013121_s05_t02.raw" 

Datasets

We need to define a SAM dataset or use one that has already been defined
The Production group will provide standard datasets for use in future, but in practice users can also define their own.
The datasets are defined using the Definition Editor http://samweb.fnal.gov:8480/sam/nova/definition_editor/

More information for how to define them can be found here: SAM datasets

The menus don’t work at present but in future one can select definitions based on the metadata information, such as run period, horn polarity etc

So for now we have to manually define a dataset into the Data Set Definition box and hit the Submit Dataset Query button:

data_tier simulated and version 'S12-11-16' and simulated.generator cosmics and simulated.detectorID fd

One can now see all files that pass this definition. We can now save the dataset with an appropriate name, also filling in our username (kerberos principal) and group (nova). Or one can use a predefined dataset. For this query we saved the dataset 'cosmics-mc-S12-11-16-fd'.

FHiCL requirements

Another acronym for your repertoire: IFDH – Intensity Frontier Data Handling

https://cdcvs.fnal.gov/redmine/projects/ifdh-art/wiki

The ifdh_art package provides ART service access to the libraries from the ifdhc package (IFDH Client tools)
We need to mention we’re using these services in the job fcl that we would like to run.

NB. We will need to add these to a global .fcl, such as services.fcl eventually

One-liners for the fcl:

# if using ART with SAM, you need these entries, OR the --sam-* command-line options
user.services.IFDH: {IFDH_BASE_URI: "http://samweb.fnal.gov:8480/sam/minerva/api"}
user.services.CatalogInterface: { service_provider: "IFCatalogInterface" }
user.services.FileTransfer: { service_provider: "IFFileTransfer" }
outputs.out1.dataTier: "raw" 
process_name: "whatever" 
services.FileCatalogMetadata: { applicationFamily: "demo" 
                                applicationVersion: "1" 
                                fileType: "importedDetector" 
                              } 
# if you use either of IFBeam or nucondb services, include the respective
# entry below.
user.services.IFBeam: {}
user.services.nucondb: {}

A working example

First we need to setup jobsub which one should have defined as a function in their ~\.profile file.

function setup_jobsub
{
   export GROUP=nova
   export USER=<kerberos principal here>
   source /grid/fermiapp/products/nova/etc/setups.sh
   setup jobsub_tools
}

Then also setup the CVS-controlled nova software. We have datasets for tag S12-11-16 in the CVS-era and jobsub only recognises this era of software. This will be updated as a priority

function setup_nova_cvs
{
   source /grid/fermiapp/nova/novaart/novasoft/srt/srt.sh
   export EXTERNALS=/nusoft/app/externals
   source $SRT_DIST/setup/setup_novasoft.sh "$@" 
}

And we can then setup the ifdh_art package and run our jobsub command.
The exact commands, using the profile functions, are:

setup_jobsub
setup_nova_cvs -r S12-11-16
PRODUCTS=$EXTERNALS:$PRODUCTS
setup ifdh_art v1_0_rc1 -q debug:e2:nu

jobsub -g \ 
       -r S12-12-12 \
       -N 100 \
       --dataset_definition=cosmics-mc-S12-11-16-fd \
       $IFDH_ART_DIR/bin/art_sam_wrap.sh \
       -X nova \
       --dest /nova/data/users/gsdavies \
       --rename uniq \
       --limit 2 \
       -c /nova/app/users/anorman/NOVA-OFFLINE/cosmictrackjob.fcl

or easier to copy:

jobsub -g -r S12-12-12 -N 100 --dataset_definition=cosmics-mc-S12-11-16-fd 
$IFDH_ART_DIR/bin/art_sam_wrap.sh -X nova --dest /nova/data/users/gsdavies --rename uniq --limit 2 -c /nova/app/users/anorman/NOVA-OFFLINE/cosmictrackjob.fcl

One can follow the submitted projects here: http://samweb.fnal.gov:8480/station_monitor/nova/stations/nova/projects
An example of a running project is included