Project

General

Profile

Interface definitions

Base URL

The base url for all interfaces should include the experiment name. This reduces the danger of confusion about which database is being modified when using or supporting multiple experiments. So, for example,

http://hostname:port/sam/minerva/api/...

Operation response formats

Some operations are capable of returning their results in multiple formats. The format is determined by an explicit format=<type> parameter, or by the Accept header sent with the request. See each individual operation for details on which content types it supports.

Error handling

Invalid requests return HTTP 4xx/5xx error codes. The content type of the body of the error response depends on the Accept header sent with the request. Possible types are text/plain, application/json, and text/html. The default is text/plain. JSON error responses are an object with at least the fields error with a string giving the specific error name, and message containing a plain text description of the error.

Most error codes have the same general meaning as given by the HTTP 1.1 standard. The server gives some of them more specific meanings:

400 Bad Request -- Operation was called with incorrect arguments
403 Forbidden -- Unknown user or insufficient privileges
404 Not Found -- Target of the operation doesn't exist (or incorrect URL path)
409 Conflict -- Requested operation conflicts with existing data (for example the file name already exists)
410 Gone -- The operation cannot be completed because the target no longer exists (for example the project already ended)
500 Internal Server Error -- This normally means a bug in the server. Contact support.
501 Not Implemented -- The requested operation is not implemented
502 Bad Gateway -- There is an issue with the server infrastructure. If the condition persists contact support
503 Service Unavailable -- An upstream server is unavailable. This can be caused by the database being down. If the condition persists contact support.
504 Gateway Timeout -- Attempting to use an upstream service timed out. If the condition persists contact support.

Error codes 502 and upwards are transient conditions so clients may wish to retry these operations after a suitable interval.

Datasets

List Definitions

GET  http://hostname:port/sam/<experiment>/api/definitions/list

Possible query parameters are defname, user, group, after, before. Wildcard characters % and ? are accepted. Dates can be in YYYY-MM-DD format, or DD-MON-YYYY format.

Create Dataset Definition

POST http://hostname:port/sam/<experiment>/api/definitions/create

Required arguments:

name Dataset name
dims Dimension string
user User name

Optional Argument

group Group -- Defaults to experiment name

Delete Dataset Definition

POST  http://hostname:port/sam/<experiment>/api/definitions/name/<dataset name>/delete

Describe Dataset Definition

GET  http://hostname:port/sam/<experiment>/api/definitions/name/<dataset name>/describe

Optionally accepts format parameter. Default value is plain; other allowed values are: json.

List files by dataset

GET http://hostname:port/sam/<experiment>/api/definitions/name/<definition name>/files/list
GET http://hostname:port/sam/<experiment>/api/definitions/name/<definition name>/files/count
GET http://hostname:port/sam/<experiment>/api/definitions/name/<definition name>/files/summary

The list method returns all the files matching the definition. The count method returns the number of files only. The summary returns the number of files and their total size.

Create snapshot

POST http://hostname:port/sam/<experiment>/api/definitions/name/<definition name>/snapshot

File information

List files by dimensions

GET or POST http://hostname:port/sam/<experiment>/api/files/list
GET or POST http://hostname:port/sam/<experiment>/api/files/count
GET or POST http://hostname:port/sam/<experiment>/api/files/summary

Required arguments:

dims The dimensions to query

Optional arguments:

fileinfo If present, return extended information about the files

Dimension strings can be very large, so the POST method is available so it can be sent in the request body rather than the URL.

File locations

GET http://hostname:port/sam/<experiment>/api/files/name/<filename>/locations

Optionally accepts format parameter. Default value is plain; other allowed values are: json.

Returns a list of locations for the file.

PUT https://hostname:port/sam/<experiment>/api/files/name/<filename>/locations

Modify the locations for a file.

add new location to add
remove existing location to remove

Get metadata

GET http://hostname:port/sam/<experiment>/api/files/name/<filename>/metadata

Optionally accepts format parameter. Default value is plain; other allowed values are: json.

Returns the metadata for the file. The plain text format is not intended to be easily parsable; use format=json for this.

File lineage

GET http://hostname:port/sam/<experiment>/api/files/name/<filename>/lineage/<lineage type>

Valid lineage types are: parents, children, rawancestors

Optionally accepts format parameter. Default value is plain; other allowed values are: json.

Returns the parents, children, or raw ancestors of the given file.

Creating and modifying file metadata

Add new file metadata

POST http://hostname:port/sam/<experiment>/api/files

Adds the metadata to the database.

The request body must have content type 'application/json'. See Metadata format for the required format.

Returns response code 201 if the metadata was successfully added. The Location header contains the URL for the new file, and the response body contains the new file ID. Invalid metadata will cause a 400 response; Syntactically valid metadata which can't be declared because of other constraints will cause a 409 response.

Validate metadata

POST http://hostname:port/sam/<experiment>/api/files/validate_metadata

Checks the validity of the supplied metadata, without modifying the database.

The request body must have content type 'application/json'. See Metadata format for the required format.

Returns response code 204 if the metadata is valid. Invalid metadata will produce an error code (usually 400 or 409).

Modify existing metadata

PUT https://hostname:port/sam/<experiment>/api/files/name/<file name>/metadata
PUT https://hostname:port/sam/<experiment>/api/files/id/<file id>/metadata

This operation is restricted to admin users only.

Modify the metadata of an existing file. The request body must be of content type 'application/json', containing a JSON object with the fields to be modfied (see Metadata format for the metadata format). Setting a value to null will delete it or set it to the default value. Fields that are not listed will not be modified. Not all metadata fields can be modified.

Modify file content status

PUT https://hostname:port/sam/<experiment>/api/files/name/<file name>/content_status
PUT https://hostname:port/sam/<experiment>/api/files/id/<file id>/content_status

Required arguments:

status New content status
comment Comment explaining why the content status was changed

Set the file content status.

Listing and modifying data values

List users

GET http://hostname:port/sam/<experiment>/api/users

Allowed query parameters

username filter by username. % and ? wildcards are available
status filter by user status

Add new user

POST https://hostname:port/sam/<experiment>/api/users

The body must be a JSON formatted object, with the following required value

username new username

and the following optional values:

first_name First name
last_name Last name
email User email
groups list of group names

List user information

GET http://hostname:port/sam/<experiment>/api/users/name/<username>
GET http://hostname:port/sam/<experiment>/api/users/id/<userid>

Update existing user

PUT https://hostname:port/sam/<experiment>/api/users/name/<username>
PUT https://hostname:port/sam/<experiment>/api/users/id/<userid>

The body must be a JSON formatted object, with any of the following fields

addgroups list of group names to add the user
status active or inactive
email user's email

List and add metadata parameters

To list existing parameters:

GET http://hostname:port/sam/<experiment>/api/values/parameters

To add new parameter:

POST https://hostname:port/sam/<experiment>/api/values/parameters

Required arguments are:

name parameter name
category category name
data_type currently type or string

List values for a parameter

To list all values for a given parameter (careful - this could be a long list):

GET http://hostname:port/sam/<experiment>/api/values/parameters/<category>.<name>

List and modify simple database values

To list existing values:

GET http://hostname:port/sam/<experiment>/api/values/<value type>

To add new values:

POST https://hostname:port/sam/<experiment>/api/values/<value type>

Adding new values takes one or more value parameters specifying the new value to add.

Valid values for value_type can be obtained using the list operation described below.

Doing this with curl looks like:

 curl --insecure --cert /tmp/x509up_u$UID --key /tmp/x509up_u$UID  \
    --data "value=newtier" \ 
    https://samweb.fnal.gov:8483/sam/minerva/dev/api/values/data_tiers

or for parameters/categories:
 curl --insecure --cert /tmp/x509up_u$UID --key /tmp/x509up_u$UID  \
    --data "name=somepaaram&param_category=somecategory&data_type=string" \ 
    https://samweb.fnal.gov:8483/sam/minerva/dev/api/values/category_params

The available generic categories can be obtained by doing:

GET http://samweb.fnal.gov:8480/sam/samdev/api/values?list=generic

(Currently the only supported value for list is generic. Others may be supported in the future.)

List applications

GET  http://hostname:port/sam/<experiment>/api/values/applications

Allowed arguments:

family Filter by application family (wildcards allowed)
name Filter by application name (wildcards allowed)
version Filter by version (wildcards allowed)

Add application

POST http://hostname:port/sam/<experiment>/api/values/applications

Required arguments:

family
name
version

Station information

Dump station

GET http://hostname:port/sam/<experiment>/api/dumpStation

Required arguments:

station The name of the station to use

Optional arguments:

dump The type of information to return (all, projects, disks, groups). The default is "all".

This is a straightforward translation of the existing sam dump station command. It would be nice to supersede this by a more structured version of the information it provides, but this has the benefit of being available now.

Running projects.

Start project

POST http://hostname:port/sam/<experiment>/api/startProject

Required arguments:

name The project name
station The name of the station to use

One of these must be given to specify the dataset:

defname Definition name
def_id Definition ID
snapshot_id Snapshot ID

One of these must be given to specify the user:

username The SAM user name
subject The user's certificate subject string

Optional arguments:

group The SAM work group. The web service should use a configurable default if none is specified

The return value for a successfully started project is a URL for the project, like

http://hostname:port/sam/experiment/api/projects/stationname/projectname

The reason for returning a new URL is that we'd like to have the ability to have the projects handled by a different web server, for instance one embedded in the station itself. That would only be a benefit for high load stations, so we don't need to implement it now, but it seems a good idea to design the interface in a way that keeps the option available.

The CORBA interface returns a bunch of information about the project. However, it's largely useless, so there doesn't seem any point in returning it to the client.

Find project

GET http://hostname:port/sam/<experiment>/api/findProject

Required arguments:

name The project name
station The name of the station to use

This returns the same URL that startProject did, so that worker jobs can find that URL from
just a project name to establish a consumer process, etc.

Project status

GET <project url>/status

Returns the current status of the project.

Establish consumer process

POST <project url>/establishProcess

Required arguments:

appname Application name
appversion Application version
deliverylocation The location to which files should be delivered. This is called node in the existing API, but the usage has become very stretched.

One of these must be given to specify the user:

username The SAM user name
subject The user's certificate subject string

Optional arguments

appfamily The application family. This is only needed if the name alone is ambiguous
description Description of the process.
filelimit The maximum number of files to deliver to this process.
schemas A comma separated list of preferred url schemas. This feature requires a v9 station, or it will be ignored

Returns the new process ID (a decimal number).

In the existing python API this is two separate steps. First 'sam establish consumer' creates a consumer, then 'sam establish consumer process' attaches processes to the consumer. Nobody ever uses the multiple consumers per project running mode (and it probably no longer works), so we can collapse this into one step. Calling the 'sam establish consumer' multiple times for the same project, user and application returns the same consumer ID each time, so it can be implemented by simply calling both methods every time the web interface is called.

Get next file

POST <project url>/processes/<processid>/getNextFile

Return the location of the next file as a URI, a newline, and the file name.

If there is no file immediately available, then the server returns a 202 response. The body contains some informational text, a colon, and the suggested number of seconds to wait before querying again.

When there are no more files available the server returns a 204 response and no body.

Update file status

POST <project url>/processes/<processid>/updateFileStatus

Arguments:

filename The filename to update
status the status to set

Allowed status values:

transferred The file has been copied locally and the job no longer needs physical access to it.
consumed This is the same as releasing the file with the ok status
skipped This is the same as releasing the file with a not-ok status

Note: transferred is not currently implemented at the back-end, so this value will be accepted but have no effect. It's present here so scripts can implement it now and be ready when it finally becomes available.

Release file

POST <project url>/processes/<processid>/releaseFile

Arguments:

filename The filename to release (the python code calls basename on it, so it accepts the full path supplied by getNextFile.
status The status of the file. 'ok' will mark it as properly processed; any other value is treated as not successfully processed.

Probably if you want to be all RESTful this ought to use DELETE somehow.

End consumer process

This one isn't really all that useful. Any process that gets all its files is automatically ended.

POST <project url>/processes/<processid>/endProcess

Set process status

This is used to mark the final status of a process to aid recovery. So it should be called after everything is well and truly done, the results have been copied back to the end user, etc.

PUT <project url>/processes/<processid>/status

or

PUT http://hostname:port/sam/<experiment>/api/projects/name/<projectname>/processes/<processid>/status

or

PUT http://hostname:port/sam/<experiment>/api/projects/name/<projectname>/process_description/<process description>/status

This does not require that the project is still running, and so can use the second url safely, not the one provided by the startProject method. The third form only works if the process description is unique within the project. It is intended to aid the case where the description is set to a batch job id, or something similar, that is easier for a script to obtain than the process id.

Arguments:

status Meaningful values are 'completed' or 'bad'

End project

POST <project url>/endProject

The python API has a force option. Without it the project is only stopped if all the files have been processed. In practice, especially in scripts, force is almost always needed, so we may as well assume it here.

Project information

List projects

GET http://hostname:port/sam/<experiment>/api/projects

Query arguments to filter the results are

name accepts wildcards
defname accepts wildcards
snapshot_id
user
group
started_after date or date&time
started_before date or date&time
ended_after date or date&time
ended_before date or date&time

Dump project

GET <project url>/dumpProject

This only works on running projects.

Project Summary

GET http://hostname:port/sam/<experiment>/api/projects/name/<project_name>/summary

will get the project summary, showing process status, etc. Optional GET arguments:

  • format=json
  • process_limit=n

Project Recovery Dimensions

GET http://hostname:port/sam/<experiment>/api/projects/name/<project_name>/recovery_dimensions?useFile=1&useProcess=1

will get the dimensions for a recovery dataset for a project. The useFile and useProcess flags
indicate whether we should consider:

  • useFile processes whose last file delivery status is "delivered" or "unknown" as failures (default value 1)
  • useProcess processes whose process status is not "completed" as failures (default value 1)

when computing the recovery dataset dimensions.

If all files were processed by a project, you get back an empty string for the recoveryDimensions.

The result of recoveryDimensions calls can be passed in to defineDataset as the dimension string to
define recovery dataset.

Backwards compatibility methods

The following deprecated methods are retained for backwards compatibility.

Create Dataset Definition

POST  http://hostname:port/sam/<experiment>/api/createDefinition

Required arguments:

name Dataset name
dims Dimension string
user User name

Optional Argument

group Group -- Defaults to experiment name

Delete Dataset Definition

POST  http://hostname:port/sam/<experiment>/api/deleteDefinition

Required arguments:

name Dataset name

Describe Dataset Definition

GET  http://hostname:port/sam/<experiment>/api/describeDefinition?name=<dataset_name>

List Definitions

Required arguments:

group Group name
GET  http://hostname:port/sam/<experiment>/api/listDefinitions?group=<group_name>

Translate Constraints

GET  http://hostname:port/sam/<experiment>/api/translateConstraints?dims=<dimension_string>[&format=plain/html]

Locate file

GET http://hostname:port/sam/<experiment>/api/locateFile?file=<filename>

Get Metadata

GET  http://hostname:port/sam/<experiment>/api/getMetadata?name=<filename>

Set process status

This is used to mark the final status of a process to aid recovery. So it should be called after everything is well and truly done, the results have been copied back to the end user, etc.

POST <project url>/processes/<processid>/setStatus

or

POST http://hostname:port/sam/<experiment>/api/projects/<stationname>/<projectname>/processes/<processid>/setStatus

This does not require that the project is still running, and so can use the second url safely, not the one provided by the startProject method.

Arguments:

status Meaningful values are 'completed' or 'bad'

The python API call for this is sam.commitProcess.