Project

General

Profile

SAM metadata facilities within the art suite

The art suite's metadata facilities comprise two main categories of functionality: features for adding metadata to an art-ROOT data file for later extraction and upload to SAM; and features for extracting those metadata from such a file for postprocessing and upload to SAM. The former category of features are provided by and within art; the latter are provided by the external application sam_metadata_dumper.

Metadata uploaded to SAM must be in JSON format. This must be borne in mind at all times, since it affects the way that certain information must be specified.

art's facilities for entry of metadata

Several global metadata items known to SAM may be specified using art executable command line options. These options are:

  • --process-name
  • --sam-process-id
  • --sam-application-family or --sam-app-family
  • --sam-application-version or --sam-app-version
  • --sam-group
  • --sam-file-type
  • --sam-run-type

The following options allow the command-line specification of per-stream metadata:

  • --sam-stream-name
  • --sam-data-tier

In addition, one may request that the file-type and run-type SAM metadata from an input file serve as metadata for the current process:

  • --sam-inherit-metadata
  • --sam-inherit-file-type
  • --sam-inherit-run-type

Using any of the metadata-inheritance options may conflict with specifying --sam-file-type or --sam-run-type.

Other information interesting to SAM is known at various points in the execution by different parts of the art system.

Global metadata collection is handled by the art service1, art::FileCatalogMetadata, via its member functions addMetadata() and addMetadataString(), which each take two strings, representing respectively a key and a value. If addMetadataString() is used, the value is automatically converted to a canonical JSON string (surrounded by double quotes (") and with suitably escaped characters where necessary) prior to insertion in the metadata database for each file. The value provided to addMetadata() on the other hand, is unaltered.

1 The command art --print-description FileCatalogMetadata will print the allowed FHiCL configuration for this service.

Stream-specific metadata may be added by art itself (such as file format) for output only to the file for which it is applicable; other metadata may be added by means of user-supplied plugins (FileCatalogMetadataPlugin) specified in the configuration for a given output stream.

One may optionally specify to the FileCatalogMetadata service (via the FHiCL parameter services.FileCatalogMetadata.checkSyntax) that incoming metadata be checked for JSON syntax compliance. This is done by composing a JSON fragment for each key / value pair, viz:

{ "key" : value }
and passing it to the RapidJSON utility for examination. In order to pass this test, value must be any legal JSON value: a numeric, a string, or a complex value represented by JSON objects ({ ... }) or arrays ([ ... ]). A parse error will result in the throwing of an exception of type art::Exception, category art::errors::DataCorruption. This will occur at point of call of art::FileCatalogMetadata::addMetadata() or art::FileCatalogMetadata::addMetadataString() for global metadata, and at plugin processing time for user-provided stream-specific plugins.

Note that this is the only place where good JSON-legal input is guaranteed; if services.FileCatalogMetadata.checkSyntax is not set to true, no checking is done on the data inserted into the metadata database.

Inheriting SAM metadata

The file_type and run_type SAM metadata as stored in an input ROOT file may be "reused" as the metadata for the FileCatalogMetadata service, without the user having to explicitly specify the metadata values. This can be done at the command line or via the FHiCL configuration.

Command-line option

The file_type and run_type metadata can be inherited from an input file in two ways:

art -c config.fcl -s input.root -o output.root --sam-inherit-file-type --sam-inherit-run-type
art -c config.fcl -s input.root -o output.root --sam-inherit-metadata   # equivalent to above

If it is desired that only one of the metadata fields is inherited, this can be done via:

art -c config.fcl -s input.root -o output.root --sam-inherit-file-type --sam-run-type="MCChallenge" 
art -c config.fcl -s input.root -o output.root --sam-inherit-run-type --sam-file-type="MC" 

Any of the following command-line invocations are errors:

art -c config.fcl -s input.root -o output.root --sam-inherit-file-type --sam-file-type="MC"         # Error - conflicting options
art -c config.fcl -s input.root -o output.root --sam-inherit-run-type  --sam-run-type="MCChallenge" # Error - conflicting options
art -c config.fcl -s input.root -o output.root --sam-inherit-metadata  --sam-file-type="MC"         # Error - conflicting options
art -c config.fcl -s input.root -o output.root --sam-inherit-metadata  --sam-run-type="MCChallenge" # Error - conflicting options

Note that additional program options may need to be specified depending on which combinations of SAM program options are used.

FHiCL configuration

One can specify in a configuration file that SAM metadata should be inherited from the input file. The following are valid configurations (where only the parameters relevant to metadata reuse are specified):

# 1. Inherit both "fileType" and "runType" 
services.FileCatalogMetadata: {
   metadataFromInput: ["fileType", "runType"]
}
# 2. Inherit only "fileType" 
services.FileCatalogMetadata: {
   metadataFromInput: ["fileType"]
   runType: "MCChallenge" 
}
# 3. Inherit only "runType" 
services.FileCatalogMetadata: {
   metadataFromInput: ["runType"]
   fileType: "MC" 
}
# 4. Inherit nothing
services.FileCatalogMetadata: {
   metadataFromInput: []
   fileType: "MC" 
   runType: "MCChallenge" 
}

Just as in the command-line options, it is an error to specify conflicting FHiCL parameters. For example, the following results in an ambiguity that triggers a configuration error:

# Error example: Cannot specify 'fileType' when it is being inherited 
#                as specified in the 'metadataFromInput' line
services.FileCatalogMetadata: {
   metadataFromInput: ["fileType", "runType"]
   fileType: "MC" # Error
}

For the full set of specifiable FHiCL parameters, type art --print-description FileCatalogMetadata.

Command-line vs. FHiCL configuration

Specified command-line options take precedence over FHiCL parameters. It is therefore not an error to have a command-line option that conflicts with the configuration--in such a case, the command-line invocation replaces the FHiCL specification.

Conflicting input metadata

If a particular SAM metadata field is inherited by input, and the value of that field for the second input file differs with respect to that of the first input file, an exception is thrown.

sam_metadata_dumper

The sam_metadata_dumper application (provided by art_root_io as of art 3.02.00) will read an art-ROOT format file, and extract the information for possible post-processing and upload to SAM. It has two output modes, JSON (default) and human-readable (--hr, -H, --human-readable).

JSON output

Please note that the JSON output has to make various assumptions about metadata as found in the file's database which are not guaranteed to be correct in all circumstances:

  • All keys will be converted to canonicalized strings.
  • If the value string starts with a `{' or `[', it is assumed to be a valid JSON object or array (respectively) and is not touched.
  • If the value string starts and ends with a `"', it is assumed to be a valid canonical JSON string and is not touched.
  • If the value string can be completely converted to a long double, it is assumed to be a number and is not touched.
  • Otherwise, the value string will be converted to a canonicalized string.

N.B. If the data were not verified to be JSON-legal at the time they were put into the database (see the description of checkSyntax, above), then sam_metadata_dumper cannot guarantee that its JSON output is actually leagl JSON.

Further processing of the output may be necessary in order to produce input acceptable to SAM. One particular example is application_family and application_version which must be combined outside art with another value to produce the JSON array the SAM metadata system is expecting:

[ family, name, version ]

Human-readable output

If human-readable output is selected for sam_metadata_dumper, then output is produced in the form:

row-num: key value
with formatting to line up the columns. The value is printed exactly as it has been read from the metadata database, including with any double-quotes if they were present. Note that, as explained above, any strings added by art are canonicalized and will therefore be printed with surrounding quotes; user-supplied values were added verbatim and may or may not be JSON-legal strings or other values.

Summary

Given the difficulties of parsing human-readable output reliably, or of guessing the intent behind a possibly-ambiguous value, we strongly recommend activating checkSyntax, using sam_metadata_dumper's default JSON output, and doing any post-processing with one of the many available JSON parsers such as those listed on the main JSON page.