Project

General

Profile

Interfaces to Services for SAM Metatdata in Root Files for art

The purpose of this note is to describe the entities involved in getting
the actual SAM metatdata into files produced by art in RootOutput_module.
These entities include three services, one of which (SAMProtocol) was
already present in the SAM_Interactions page. This note will specify the
interfaces to each of these services.

Entities Involved

The following entities participate in the flow of information leading to the SAM
metadata. These are all implemented as classes. In some cases, a new class is
discussed in several places in the design documentation (including here) with
differnet names assigned to the same concept. A brief section below relates the
names we have chosen, with previously used names.


There are 8 entities involved.  The first three are services:

SamMetadata (service):  
  Responsibility is to figure out (from Pset and other art and process-related 
  info) the set of items which every output module should place into its SAM
  metadata, and to convey set of items that when asked.  Written by art
  developers.  

GeneralFileTransfer (service):
  Base class: art::FileTransfer
  Responsibility is to copy a file specified by a URI, inot an accessible
  local file, or return a status saying why that could not be done.  Written
  by SAM developers.  This would be of value outside any relation to SAM.

SAMProtocol (service):
  Base class: art::FileDelivery
  Primary responsibility is to provide, when asked, a URI specifying the next 
  input file in a data set.  This is also the file catalog - it is informed
  when output files become available.  Written by SAM developers.

IntermediatePostProcessor:
  This existing class modifies the apparent job PSet.  In particular, it 
  injects parameters that were specified on the command line.  The agreed
  interface for SAM metadata deals with quite a few new command-line-settable
  parameters.  Modified by art developers.  See appendix I for a list of 
  all command-line-settable  configuration parameters relevant to SAM metadata.

InputModule (base class):  
  This existing class, which among other things gets input file names, will
  need to be modified to also deal with files coming from SAM data sets via 
  the SAMProtocol/FileTransfer route.  We include the DRISI klass in 
  this entity.  This class may not require much modification, as we are
  strategically inclined to place the necessary logic into two new methods 
  of RootInputModule.  Eventually, when DRISI goes away and InputModule treats
  the file access state machine consistently, those two new methods will
  migrate to a positioin of more generallity.  Modified (if needed) by art 
  developers.

RootInputModule:
  This existing class will need to be modified to recognize the usage of 
  SAM input (rather than input via a file list) and, when SAM is used, will
  need two new methods (which rely on the new services):
    a) getURIfromSAM:  Obtain the identity of the next file to open
    b) getFileFromURI: Get a local copy, ready to open, based on that URI
  These methods should avoid using class data specific to RootInputModule,
  so that when the time comes they can be moved to some more general usage.
  Modified and new methods written by art developers.

OutputModule (base class):
  This existing class will need further functionallity:  To form the desired
  SAM metadata by requesting some genral data from a service, and adding some
  module-specific items.  It also will need to modify its doCloseFile()
  method, sending information to the SAMProtocol service when the
  file is closed.  Modified by art developers.

RootOutputModule:
  This existing class already is placing a SAM metadata item into the output
  file.  It will need to replace the simplistic metadata it is currently
  creating, with a set of metadata obtained from its base class.  Modified by
  art developers.

Names of Classes and Types

In various documents, various of the above entities were named differently.
Here is a table relating the names used. The names after = signs are names
which appeared in earlier documents and which are no longer intended to be used.

If people object to the name settled on here, please let us know soon.


SAMProtocol         = SAMFileDeivery
art::FileDelivery   = (interface for) SAMProtocol  = art::CatalogInterface

GeneralFileTransfer = SAMFileTransfer
art::FileTransfer   = (interface for) GeneralFileTransfer

SamMetadata

In addition, two named status enums are a useful aid in understanding int returns
of methods in the interface definitions:

art::FileTransferStatus
art::FileDeliveryStatus

Services

Each service is defined by one or more functions that can be invoked.

The flow of the SAM metadata (and also data file information related to SAM)
is described by six "requests" (interactions, implemented by function calls)
between various non-service entities and one of the three services. Each
of these request will be detailed when specifying the various services; in
general terms, the flow is:

Request 0:  usingSAMdataset -- RootInputModule tells SAMProtocol service
                   that it will want files from this dataset.
                   Once, near beginning of job.

Request 1:  getNextFileURI  -- RootInputModule asks SAMProtocol service
                   where the next input file is. The heart of 
                   getURIfromSAM.  Once per input file.

Request 2:  copyToScratch -- RootInputModule tells FileTransfer  
                               service that it needs a local copy of this file. 
                   The heart of getFileFromURI. 
                   Once per input file.

Request 3:  URIobtained     -- RootInputModule tells SAMProtocol service
                   that has obtained and successfully opened a
                   local copy of this file; or possibly that it
                   cannot do so, and why.  Once per input file.

Request 4:  getMetadata     -- OutputModule tells SamMetadata service
                   that it needs to know the metadata which
                   is common to all output files.  Once per 
                   job per output file (though it returns 
                   identical results each time).

Request 5: outputFileClosed -- OutputModule tells SAMProtocol service                   
                   that its output file has been written and
                   successfully closed, and can be harvested.
                   Once per job per output file.

In addition, there are several other information-provision requests that
OutputModule uses to keep the FileDelivery service (SAMProtocol) up to date:

Request 5a: outputModuleInitiated
Request 5b: outputFileOpened
Request 5c: eventSelected

Note that the concrete impemntation of the FileDelivery service may, but is
not obligated to, react to these calls. In particular, it is obvious tht
SAMProtocol cannot afford to communicate the identity of each accepted event,
oever the web to the actual SAM server. However, this interface provides the
information in case the service wants to collect and later deliver it.

SamMetadata


  std::vector<KeyValuePair> getMetadata () const;

    Note:  KeyValuePair can be std::pair< std::string,std::string> >,
    or perhaps struct{std::string key; std::string value;} or whatever
    similar form we decide.

  See appendix II for details of what KeyValuePairs getMetadata () returns.     

  ** The following method was NOT agreed to but might serve to satisfy,
  ** in an easy-to-implement way, the desire to let the user add custom
  ** items to the database:

  void addToMetadata (std::string const & key, std::string const & value);

    Note: If this is done, then SamMetadata ought to prepare its basic delivery 
    at an early stage, but OutputModule should only getMetadata() when
    it is about to close the file, so that user additions made at any time 
    can be part of the data.  Of course, misuse of this feature will be a
    matter for the experiment to deal with.

ctor or inititalization:

  SamMetadata (ParameterSet const& p, ProcessID_t id);
      // All that SamMetadata needs to know can be deduced from the PSet
      // and *perhaps* the process id for the job is also needed.

GeneralFileTransfer

This class implements the art::FileTransfer interface.


  int copyToScratch (std::string const & URI,std::string & file); 

    Note: the return value is intended to match one of the values of the
    emum @art::FileTransferStatus@.  There is one special status SUCCESS
    that indicates that the file has been obtained and copied, and that
    the file argument now contains a string with a fully qualified path.
    Otherwise, file comes back as an empty string. 

    Note:  One designer prefers the method name getFile().

  See appendix III for status codes that may be returned by copyToScratch().

ctor or initialization:

  SAMFileTransfer (ParameterSet const&, ActivityRegistry&);
      Note: It is possible that this service needs nothing other than that which
            is provided by the requests, because it can decide on its own about
            where to place the delivered files.  In that case, the ctor is 
        simplified.  The signature shown allows for explicitly setting the
        scratch area or taking that information from the ParameterSet.

SAMProtocol

  SamFileStatus usingSAMdataset 
         (SAMDatasetDefinition const & datasetDefinition);
    // Tells SAMProtocol which dataset this job wants files from. 

   *** Note:  It may be better to place the SAMDatasetDefinition into the
   ***        ctor of SAMProtocol.  If the information is available at 
   ***        an early enough time, that is certainly the better approach.
   ***        In that case, this "request" becomes moot and there are only 5 
   ***        requests we need to specify.   

  SamFileStatus getNextFileURI (std::string & uri, double & waitTime);

    Note:  getNextFileURI, from the processID and base URL, constructs the URL
    corresponding to SAMWeb;s "getNextFile", and performs the http POST 
    request.

    Note: SamFileStatus is an enum.  There is one special status SUCCESS
    that indicates that a valid URI for a file has delivered, and that
    the data argument now contains a string with a URI.  Otherwise, dataset
    comes back as an empty string.  

    Note: There is a second special status   NO_MORE_FILES = 204, which 
    indicates that the file pool has been exhausted; RootInputModule should 
    treat this as it would when reaching the end of a file list provided in 
    the .fcl file.  There is another status TRY_AGAIN_LATER = 202, which
    sets waitTime to a suggested time, for the service to try again by 
    repeating the POST call.

  See appendix IV for other status codes that may be returned by getNextFileURI().
    In general, codes 400-499 represent fatal errors, and 500-599 recoverable
    errors; in the latter case, art should re-try of times (perhaps with some
    prescribed delays) before promoting this to a fatal error.

  void updateStatus (std::string const & uri, FileTransferStatus status);
    // Tells SAMProtocol that the relevant file was obtained.
    // Status might be some code other than SUCCESS, indicating a problem.

Note: Since in principle SAMProtocol can tell which process is responding
with updateStatus, and will have remembered which URI it last supplied,
in principle the URI argument could be obviated. It is felt, however, that
the redundancy provided by specifying the URI may be helpful under some
circumstances.

  void outputFileClosed (std::string const & outputFileName);

    Note: if SAMProtocol needs to know anything else about the file,
    it can get it by reading the SAM metadata contained in the file.  But
    we will not be iconoclasts about this point; if it is awkward to learn
    to read the database, and some essential item of information is needed
    by SAMProtocol when the file has been closed, we can agree to modify
    the signature of this "request" so as to provide it from the OutputModule.

ctor or initialization:

    SAMProtocol ( SAMDatasetDefinition const & datasetDefinition,
            SAMProcessID_t const & samProcessID);

    *** See note above about SAMDatasetDefinition

Steps Needed to Implement this in art

The steps needed to implement this SAM metadata handling, on the part of the
art deelopers, involve modest enhancements to six of the eight entities mentioned
above. The steps are outlined in SAMmetadataImplementation.

Appendix

Appendix I: Configuration Parameters Relevant to SAM Metadata

Appendix II: KeyValuePairs in the SAM Metadata

Appendix III: Status Codes Returned by copyToScratch()

Appendix IV: Status Codes Returned by getNextFileURI()

Appendix V: Notes taken at meeting of July 24 on this subject