Project

General

Profile

SAM Interactions

Earlier conversations about SAM-related metadata can be found at Organization of metadata.

This page reflects our current best idea of the shape of the SAM interaction system. Please see the page history for earlier versions.

We had two meetings with the SAM people and some of the experiment people to discuss how art will interact with the data handling system (SAM in this case) and with to discuss the information that needs to be recorded in each output file so that the file can be correctly identified in the SAM catalog.

Parts of this project include:

  1. file-scope metadata collection and storage
  2. operation and integration of the SAM web protocol
  3. specifications for and implementation of the utility functions that SAM needs to provide to simplify processing of the data received from the SAM protocol

One of the issues that is addressed here is maintenance of this code and its relationship with art and art's external dependencies. We prefer most of the code for the items above to be modified and maintained outside the mainline art project. One reason for this is to keep the releases separate so that art does not require a version increase due to changes in protocol operation of the number of metadata items that need to be tracked in the art output files. Of course the experiment code that uses the SAM "plugin" code will need to keep up with versions of this package.

Interaction with the SAM web service.

  1. art shall provide a new CatalogInterface class.
    class art::CatalogInterface {
    public:
      void configure(std::vector<std::string> const & items);
      int  getNextFileURI(std::string & uri, double & waitTime);
      void updateStatus(std::string const & uri, FileDisposition status);
      void outputFileClosed(std::string const & module_label,
                            std::string const & fileFQname);
      void outputFileOpened(std::string const & module_label);
      void outputModuleInitiated(std::string const & module_label,
                                 fhicl::ParameterSet const & pset);
      void eventSelected(std::string const & module_label,
                         EventID const & event_id,
                         HLTGlobalStatus const & acceptance_info);
      bool isSearchable();
      void rewind();
      virtual ~CatalogInterface() = default;
    private:
      // Classes inheriting this interface must provide the following methods:
      virtual void doConfigure(std::vector<std::string> const & item) = 0;
      virtual int  doGetNextFileURI(std::string & uri, double & waitTime) = 0;
      virtual void doUpdateStatus(std::string const & uri, FileDisposition status) = 0;
      virtual void doOutputFileOpened(std::string const & module_label) = 0;
      virtual void doOutputModuleInitiated(std::string const & module_label,
                                           fhicl::ParameterSet const & pset) = 0;
      virtual void doOutputFileClosed(std::string const & module_label,
                                      std::string const & fileFQname) = 0;
      virtual void doEventSelected(std::string const & module_label,
                                   EventID const & event_id,
                                   HLTGlobalStatus const & acceptance_info) = 0;
      virtual void doIsSearchable() == 0;
      virtual doRewind() = 0;
    };
  2. REX will provide a service to implement the CatalogInterface interface (e.g. IFFileCatalog). As a service, it must implement a constructor (ParameterSet const&, ActivityRegistry&) in addition to implementing the do... functions as required by the interface.
  3. Note that the list of items (file names or SAM process IDs) will be passed to the interface via the configure(...) function; it should not be obtained via the service's parameter set.
  4. It is suggested that REX use the enumerated class art::FileDeliveryStatus to communicate the result of getNextFileURI().
  5. The service shall advertize its ability to support an iterative search for files with the doIsSearchable() function. If (e.g. for SAM) this returns false doRewind should be implemented trivially (it shall never be called).
  6. If a catalog is searchable, the doRewind() function such that getNextFileURI(...) shall return a URI corresponding to the first item in the list provided with configure(...).
  7. See below for requirements on the art service system.

File Transfer

  1. art shall provide a FileTransfer interface:
    class art::FileTransfer {
    public:
      int translateToLocalFilename(std::string const & uri,
                                   std::string & fileFQname);
    
      // Remaining boilerplate:
      virtual ~FileTransfer() = default;
    
    private:
      // Classes inheriting this interface must provide the following method:
      virtual 
      int 
      doTranslateToLocalFilename(std::string const & uri,
                                 std::string & fileFQname) = 0;
    };
  2. REX will provide a service to implement the FileTransfer interface. As a service, it must implement a constructor (ParameterSet const&, ActivityRegistry&) in addition to doTranslateToLocalFilename(...) as described above.
  3. It is suggested that REX use the enumerated class art::FileTransferStatus to communicate the result of translateToLocalFilename().

Metadata collection and storage.

Required metadata (per July 24 meeting):

Per art execution:
  1. application family: command line parameter --sam-application-family=something
  2. application name: this is the art “process”; --process-name=something will be a new command line parameter.
  3. application version: command line parameter --sam-application-version=something
  4. file type: command line parameter --sam-file-type=something
Per output module:
  1. data tier: command line parameter: --sam-data-tier=module:tier-spec
  2. stream name: command line parameter: --sam-stream-name=module:stream-name. If the value is lacking in the configuration of the output module, then the module label is used as the stream name. The default value forbids two modules using the same stream name.
  3. file format (e.g. “artroot”): provided by the output module.
  4. file format era (as specified by art; generally not backwards compatible)
  5. file format version (as specified by art; generally guarantees backwards compatibility)

Service

  1. art shall provide a service, FileCatalogMetadata with the following interface:
    class art::FileCatalogMetadata {
    public:
      typedef std::vector<std::pair<std::string, std::string>> collection_type;
      typedef typename collection_type::value_type value_type;
    
      FileCatalogMetadata(ParameterSet const&, ActivityRegistry&);
    
      // Add a new value to the metadata store.
      addMetaData(std::string const & key, std::string const & value);
      getMetadata(collection_type &) const; // Dump stored metadata into the provided container.
    };
  2. The service shall be responsible for marshaling the required metadata.
  3. The service's addMetaData() method shall be callable from anywhere, including other services.

Command-line / Configuration interface

Since in art, command-line options interpreted by the wider art system are passed by injection into the FHiCL configuration, each command-line option has a corresponding parameter in a specified location.

SAM File Delivery.

The following command-line options will activate SAM file delivery:
  • --sam-web-uri=<sam-web-uri>
    services.CatalogInterface: { service_provider: IFCatalogInterface
                                 webURI: "<sam-web-uri>" 
                               }
  • --sam-process-id=<sam-process-id>
    source.fileNames: [ "<sam-process-id>" ]

Both must be specified, or none. If they are specified, then the below-mentioned metadata items must also be specified;

Metadata specification

  • --process-name=<art-process-name>
     process_name: "<art-process-name>"
    Obtain with
    art::ServicHandle<art::TriggerNamesService>()->getProcessName()
  • --sam-application-family=<sam-app-family>
    services.FileCatalogMetadata.applicationFamily: "<sam-app-family>"
  • --sam-application-version=<sam-app-version>
    services.FileCatalogMetadata.applicationVersion: "<sam-app-version>"
  • --sam-file-type=<file-type>
    services.FileCatalogMetadata.fileType: "<file-type>"
  • --sam-data-tier=<module-label>:<tier-spec>
    outputs.<module-label>.dataTier: "<tier-spec>"
    This should be specified per-output module. Specifying without the <module-label>: prefix shall be treated as prividing a default value for output modules which do not have a specified data tier.
  • --sam-stream-name=<module-label>:<stream-name>
    outputs.<module-label>.streamName: "<stream-name>"
    This should be specified per-output module, and default to the module label if not specified. If a stream name is specified without a <module-label>: prefix this shall be used as the default in preference to the output module's label.

Interactions between services and art infrastructure.

  1. The command-line processor shall inject parameters into the correct parameter sets according to the specifications above.
  2. At open input file time, the RootInputFile shall pass the file name to the service implementing CatalogInterface to obtain a URI, pass that URI to the service implementing the FileTransfer interface and notify the CatalogInterface service of the correct status.
  3. At open output file and close output file time, the OutputModule base class shall notify the CatalogInterface service of these operations.
  4. The OutputWorker at construction time shall notify the CatalogInterface service of the module's construction as it has access to the module's label and parameter set through the ModuleDescription.
  5. At input file close time, the RootInputFile shall notify the CatalogInterface service that the file has been consumed.
  6. At output file close time, the OutputModule base class shall:
    1. Invoke the FileCatalogMetadata service to obtain the collection of metadata.
    2. Add the data tier and stream name to the metadata list.
    3. Ask the specific output module via virtual interface for the file format, the file format version and the file format era, and add them to the metadata list.
    4. Invoke the virtual writeFileCatalogMetadata() function with the metadata collection for the specific output module to write to persistent storage (eg DB table as TKey).
  7. The input and output modules as appropriate should notify the CatalogInterface service of appropriate events in the case of an error (eg "skipped" to updateStatus).

Service registration system.

The art service system:

  1. shall be capable of allowing a service to register itself as an implementation of an interface, e.g. by:
    DEFINE_ART_SERVICE(IFFileCatalog,CatalogInterface);
    
    This macro permits the following inside of the art system:
    ServiceHandle<CatalogInterface> sh;
    
  2. shall allow a service to be configured by specifying the interface as the name of the parameter set and explicitly the service_type parameter as the concrete type of the service. This allows the configuration to be overridden (e.g. by the command-line processor). It shall be an error to attempt to configure a service implementing an interface using the concrete type as the parameter set name.
  3. shall allow a service to be accessed via a handle to its base or a handle to its concrete type. However, both types of handle shall access the same service instance.
  4. shall check for duplicate configuration (simultaneous configuration of two services implementing the same interface) and throw an exception.