Testing SAM functionality

Jump to the Processing Templates section to see the template scripts and get started!

Normal tests

  • Basic setup
    . /grid/fermiapp/products/common/etc/
    setup sam_web_client
    export SAM_EXPERIMENT=annie
  • Test SAM is alive and print basic info:
    samweb server-info
    > SAMWeb API for annie
    > SAMWeb version: 2.7.1
    > Connected to:
    > Cherrypy version: 3.8.0
    > HTTP User-Agent: SAMWebClient/v2_0 (samweb) python/2.6.6
    > User information:
    > Untrusted identity:
    > Unauthenticated
    > Roles: None...
  • Test SAM successfully responds to queries, first by testing for a file known not to exist (please don't make a file called foo!)
    samweb locate-file  foo
    > File 'foo' not found
  • Test SAM successfully responds to a valid query for a known test record.
    samweb locate-file  onesmall.root
    > dcache:/pnfs/annie/persistent/anniepro
  • Retrieve meta-data associated with a file. The response is returned in human readable format - note dimension names here are not the same as when declaring a file! Specify JSON format to see the raw metadata if you wish to use it as a template
    samweb get-metadata onesmall.root
    >      File Name: onesmall.root
    >        File Id: 751
    >    Create Date: 2017-02-20T17:03:59+00:00
    >           User: kreymer
    >      File Size: 20191035
    >       Checksum: adler32:04a5c023
    > Content Status: good
    >      File Type: unknown
    >    File Format: root

SAM Project Overview

Projects and Datasets

Files are declared to SAM by their location and JSON metadata. The database of metadata can then be queried to retrieve the locations of, and other metadata about, files whose metadata meets certain criteria.
Standard metadata tags are described here:
ANNIE's implemented metadata usage is described here:

  • A "Definition" is a set of criterion for extracting desired files (effectively an SQL query)
  • A "Dataset" is a saved Definition that may be referred to by name.
  • A "Snapshot" is the set of files that matched a given Definition at the time the snapshot was taken.
  • A "Project" is a Snapshot, together with an interface for retrieving those files and tracking their processing.
    • A Project may be created either with a Dataset name, a Definition or with a Snapshot ID. If one of the former two is used, a snapshot is implicitly taken when the project is created.
  • To list the existing Datasets, use:
    $ samweb list-definitions
  • To take a snapshot of a Definition, for example the Dataset mwm_test, use:
    $ samweb take-snapshot mwm_test_9
  • To list existing Projects, use:
    $ samweb list-projects
  • Note that Projects must have a unique name, and will expire after a given period of inactivity. Once the Project has ended, it cannot be revived (although a new Project with the same Snapshot may of course be created).
  • When submitting jobs to the grid, there may be some delay before your processes are started. In order to prevent Projects from expiring before your job has started, a grid-wrapper is available that will start a SAM project if none exists, or else attach to it if one does.


  • A "Consumer" is a process that retrieves and operates on files.
  • Consumer commands have two versions, a samweb version for running interactively, or an ifdh version for running in a grid environment.
  • Each Consumer must register itself with the Project, via the `samweb start-process` or `ifdh establishProcess` commands. A list of Consumers is maintained by the Project.
  • A Consumer attaches to a Project via its Project URL. This is returned upon the Project creation or, for a running Project, may be retrieved via the `ifdh findProject` command.
  • Multiple Consumers can (and normally will) attach to the same Project. The Project distributes files sequentially to Consumers upon request, until no files remain.
  • A Consumer requests a file via the `samweb get-next-file` or `ifdh getNextFile` commands. When a Consumer requests a file, the request exit value specifies:
    • 200, if it has returned a file
    • 204, if there are no more files
    • 202, if the Project has more files, but they are not currently accessible. In this case the Consumer should wait, then re-submit the request.
  • The request return string provides more information, such as the file location if a file has been fetched, or how many seconds to wait before re-submitting a request, if appropriate.
  • After processing a file, the consumer must use the `samweb release-file` or `ifdh updateFileStatus` commands to release the file from SAM and report whether the processing was successful.
  • When no further files are available to process, as a final action each Consumer should use the `???` or `ifdh setStatus` command to report it's final status as "complete" before terminating.

Processing Templates

The general overview of processing using a SAM Project is as follows:
  1. Generate unique project name.
  2. Start project.
  3. Start consumer process.
  4. File loop.
    1. Get location (uri) of next file.
    2. Copy file to scratch disk.
    3. Process file.
    4. Release file.
    5. Delete file from scratch disk.
  5. Stop consumer process.
  6. Stop project.
  • A skeleton script has also been provided by Art, which is located at /annie/app/users/anniepro/sam_test_project. The code is well commented so check it out to see the commands being run. The expected output is shown below.
    $ /annie/app/users/anniepro/sam_test_project annie st-onesmall
    >             sam_test_web 20140225
    >     running  samweb_test_project_20170220172427
    >     station  annie
    >     dbserver 
    >     dataset  st-onesmall
    >     files    1/1
    > FILE gsi
    > Project: samweb_test_project_20170220183044 (83)
    > Project status: ended complete
    > Project start time: 2017-02-20T18:30:45.672413+00:00
    > Project end time  : 2017-02-20T18:30:46.621161+00:00
    > Project description: 
    > Snapshot id: 21
    > Dataset definition: st-onesmall (41)
    > Station: annie (1)
    > Username: kreymer (8)
    > Group: annie (1)
    > Number of consumer processes: 1
    >    Consumer process id: 83
    >       Process status          : finished
    >       Application             : test test version: 1 (62)
    >       Node name               :
    >       Process description     :  
    >       Start time / end time   : 2017-02-20T18:30:45.971209+00:00 / 2017-02-20T18:30:46.519453+00:00
    >       Number of files consumed: 1
    >       Number of files failed  : 0
    >       Last file               : onesmall.root
    >       Last file status        : consumed
    > Number of files in snapshot: 1
    > Number of files consumed   : 1
    > Number of files failed     : 0

Additional Notes

Advanced functionality

. /grid/fermiapp/products/common/etc/
setup sam_web_client
export SAM_EXPERIMENT=annie

samweb --version


samweb server-info

openssl x509 -in /tmp/x509up_u`id -u` -noout -subject -dates
subject= /DC=org/DC=cilogon/C=US/O=Fermi National Accelerator Laboratory/OU=People/CN=Arthur Kreymer/CN=UID:kreymer
notBefore=Feb 15 16:17:02 2017 GMT
notAfter=Feb 22 16:22:02 2017 GMT

# MAKE X509 PROXY ( should be done by samweb by default, as needed )


samweb locate-file SAMTEST
File 'SAMTEST' not found


samweb validate-metadata ${SM}

samweb declare-file ${SM}

samweb get-metadata SAMTEST
     File Name: SAMTEST
       File Id: 601
   Create Date: 2017-02-20T15:57:12+00:00
          User: kreymer
     File Size: 123
      Checksum: adler32:666
Content Status: good
     File Type: unknown
   File Format: unknown

samweb retire-file  SAMTEST


Users are regularly cloned from the VOMS/GUMS database.
They can be viewed added modified by users in the admin_role group

samweb list-users | sort

samweb describe-user kreymer
     Username: kreymer
      User Id: 8
       Status: active
       Groups: annie
             : admin_role
Grid Subjects:

samweb add-user --first-name=Mayly --last-name=Sanchez --groups="annie,admin_role" msanchez