Project

General

Profile

Sam_web_client_Command_Reference

General Common Options

All commands have help options and base options that are the same. Some commands have specific options for just that command. Only command-specific options will be listed in the per-command information below.

Help options

-h, --help            show this help message and exit
--help-commands       list available commands

For example:

% samweb find-project --help

usage: samweb [base options] find-project [command options] <project name>

Return the URL for a running project

(and lists the options)

Or to find the available commands:

% samweb --help-commands
Available commands:

  Definition commands:
    count-definition-files
    create-definition
    delete-definition
    describe-definition
    list-definition-files
    list-definitions
    modify-definition
    take-snapshot

  Data file commands:
    add-file-location
    count-files
    declare-file
    file-lineage
    get-file-access-url
    get-metadata
    list-files
    locate-file
    modify-metadata
    remove-file-location
    retire-file
    validate-metadata

  Project commands:
    find-project
    get-next-file
    list-projects
    prestage-dataset
    project-recovery
    project-summary
    release-file
    run-project
    set-process-status
    start-process
    start-project
    stop-process
    stop-project

  Utility commands:
    file-checksum
    server-info

Admin commands:
    add-application
    add-data-disk
    add-parameter
    add-user
    add-value
    describe-user
    list-applications
    list-data-disks
    list-parameters
    list-users
    list-values
    modify-user

Base Options

-e EXPERIMENT, --experiment=EXPERIMENT
use this experiment server. If not set, defaults to $SAM_EXPERIMENT.
--dev               use development server
-s, --secure        always use secure (SSL) mode
--cert=CERT         x509 certificate for authentication. If not specified, use $X509_USER_PROXY, $X509_USER_CERT/$X509_USER_KEY or standard grid proxy location
--key=KEY           x509 key for authentication (defaults to same as certificate)
-v, --verbose       Verbose mode

Definition commands

samweb [base options] count-definition-files [command options] <dataset definition>

Count number of files in a dataset definition. This can be done as a sanity check before launching a job, especially to get an idea how large the job is before submitting it. If the number of files is zero, check for mistyping. Also check to verify you haven't matched more than you intended by adding an OR condition that matches everything. Counting the files in the definition helps if you have a rough idea of how many files it should be, or just to ensure you didn't mistype the name.

Counting files is also done when users want to ensure that one job is run per file, such as when a matching output per file is desired. Users can run this command to count the files to be able to tell jobsub how many files there are.

Users also might want to count the number of files to estimate how long the worker jobs will take in case they are close to the limit of processing allowed.

samweb [base options] create-definition [command options] <new definition name> <dimensions>

Create a new dataset definition. When doing experiment analysis, definitions are probably already defined, but users might want to do analysis with subsets of a defined dataset definition, and this command is used to create that dataset definition.

Most experiments have a small group to define the official definitions for the experiment, and this group uses this command to make those definitions.

Another possible use is to create a special dataset definition to figure out the noise in an experiment, or to update calibration, or other things that aren't in the official definitions for the experiment.

Users also can make datasets of things that aren't really data, like configuration files for simulation, or logfiles. SAM allows the creation of a dataset consisting of logfiles so analysis can be run over that dataset, say to look for a particular error message over the last five years.

create-definition options:

--user=USER         
--group=GROUP       
--description=DESCRIPTION
--help-dimensions   Return information on the available dimensions

samweb [base options] delete-definition [command options] <dataset definition>

Delete an existing dataset definition. This is usually done when the sanity check count comes back wrong, because something was mistyped. In this case, delete the mistyped name to avoid future confusion. Once a definition has been used, it can't be deleted via this command since it is attached to other things.

samweb [base options] describe-definition [command options] <dataset definition>

Describe an existing dataset definition. Show what is in the definition, often used to determine if this is the right dataset to use, or to make something similar, such as for a new release of software. Instead of using this command line version, people may want to use the experiment-specific definition editor web page described at:
https://cdcvs.fnal.gov/redmine/projects/sam/wiki/User_Guide_for_SAM#Definition-Editor

The command line version is useful for scripts, or if you know the name.

samweb [base options] list-definition-files [command options] <dataset definition>

List files in a dataset definition. This returns a list of files, which can be long. (You may want to count the number of files first.) This is good for a sanity check or if you want to perform an operation other than running a job, such as running a script to see where the files are (like samweb locate-file). Sometimes users want to see the list of files, such as for a recovery dataset, especially if a particular file fails several times. You might want to look at that file to see if the file is garbled, or it causes failure every time and requires human intervention.

list-definition-files options:

--fileinfo          Return additional information for each file
--summary           Return a summary of the results instead of the full list

fileinfo gives file id, file size, and event count.

samweb [base options] list-definitions [command options]

List existing dataset definitions. This gives the list of names of datasets that have been defined, not what is in a particular definition. See the previous command for details about a particular dataset definition.

list-definitions options:

--defname=DEFNAME   
--user=USER         
--group=GROUP       
--after=AFTER       
--before=BEFORE

samweb [base options] modify-definition [command options] <existing dataset definition>

Modify an existing dataset definition. Useful when you want to change the definition of a dataset. This is new in v1_7.

Currently only name and description can be modified

modify-definition options:
--defname=DEFNAME
--description=DESCRIPTION

samweb [base options] take-snapshot [command options] <dataset definition>

Take a snapshot of an existing dataset definition. By taking a snapshot, and then using the snapshot id, users are assured the list of files has not changed even if someone added new files to SAM. When a project is started, that creates a snapshot that has a snapshot id that is used for that project. So if new files are added, they aren't included.

take-snapshot options:

--group=GROUP

Data file commands

Data file commands are the file catalog subset part of SAM. These commands cover how to add file locations, count them, add metadata, change it, and remove files. These commands tell SAM information needed to keep track of the files.

samweb [base options] add-file-location [command options] <file_name> <location>

Add a location for a file. This is a way to tell SAM about a copy of a file that exists somewhere. The location has to be a location SAM already knows about. (See the admin command, "add-data-disk" at: https://cdcvs.fnal.gov/redmine/projects/sam/wiki/Scratch_area?parent=Wiki#samweb-base-options-add-data-disk-command-options-ltmount-pointgt

The add-file-location command only works for directories where the hosts are known. Users can't just add files in random locations.

The SAM FTS subsystem calls this command for the user when it puts files away. If users place a file in a known location on their own, SAM won't know about it until the user executes this command.

samweb [base options] count-files [command options] <dimensions query>

Count files by dimensions query. This is used to determine how many files will be in the dataset definition before defining it. People usually build the query over time, and this command helps.

samweb [base options] declare-file [command options] <name of metadata file (json format)>

Declare a new file into the database. This lets users tell SAM a given file exists and what its metadata is. This must be done before add-file-location. There has to be some metadata in a file when it is declared.

samweb [base options] file-lineage [command options] <parents|children|ancestors|descendants|rawancestors> <file name>

Get lineage for a file. Depending on the metadata, ancestor and descendants can be found. This command gives part of the family tree.

samweb [base options] get-file-access-url [command options] <file name>

Get urls by which files can be accessed. Note that using this command does no
data movement or prestaging and is not recommended for large scale data
access. This is new in v1_7.

get-file-access-url options:
--schema=SCHEMA Access schema for file
--location=LOCATION Filter returned urls by location prefix

samweb [base options] get-metadata [command options] <file name>

Get metadata for a file. Metadata is data about data. The main purpose is to allow files to be queried. Metadata includes physical data such as file size or checksum. It also includes the physics metadata such as: run number, detector configuration, and simulation parameters. Raw and processed data files have a large amount of metadata that describe the files themselves, how they were generated, and other auxiliary information useful in understanding what is in a file or how it should be grouped with other files.

get-metadata options:
--locations Include locations in output (requires --json)
--json Return output in JSON format

JSON (JavaScript Object Notation) is a lightweight data-interchange format built on two structures:
  • A collection of name/value pairs. In various languages, this is realized as an object, record, struct, dictionary, hash table, keyed list, or associative array.
  • An ordered list of values. In most languages, this is realized as an array, vector, list, or sequence.
    For more information, see: http://json.org/

samweb [base options] list-files [command options] <dimensions query>

List files by dimensions query. This lists the files that match the dimensions query given to the command. Is used to build a query. This can also be done via the Definition Editor mentioned at: https://cdcvs.fnal.gov/redmine/projects/sam/wiki/User_Guide_for_SAM#Definition-Editor

It can also be done via this command line. Users examine the metadata, and decide if they want to change the dimensions query.

list-files options:

--parse-only        Return parser output for these dimensions instead of evaluating them
--fileinfo          Return additional information for each file
--summary           Return a summary of the results instead of the full list
--help-dimensions   Return information on the available dimensions

samweb [base options] locate-file [command options] <file name>

List file locations. This is how to find out where SAM thinks there are copies of the files. This is also a way to determine if SAM is awake, by giving it a bogus file name.

samweb [base options] modify-metadata [command options] <file name> <name of file containing metadata parameters to modify (json format)>

Modify metadata for an existing file.

samweb [base options] remove-file-location [command options] <file_name> <location>

Remove a location for a file. Users cannot just remove a copy of a file, they have to tell SAM it's not there anymore. The order is tell SAM via this command, then remove the file.

samweb [base options] retire-file [command options] <file name> [file name] ...

Mark a file as retired. This is done when a file is discovered to be corrupted, or no one knows what it is, but it is causing errors in the software being used to analyze it.

samweb [base options] validate-metadata [command options] <name of metadata file (json format)>

Check file metadata for correctness. Correctness means it can be parsed such as checking that if parents were declared, they are valid. First it parses the json data, then it validates the metadata by checking if certain items are present. If they are present, this command validates that they're the right type.

Project commands

All the project commands (start-project, start-process, get-next-file, etc) are for advanced use only. Normal users won't want to use these specific commands, but may be interested in th all-in-one run-project command.

samweb [base options] find-project [command options] <project name>

Return the URL for a running project. This command converts a project name into an identifier to talk to the project. This is called to be able to do get-next-file, release-file, etc.

find-project options:

--station=STATION

samweb [base options] get-next-file [command options] (<process url> | <project url> <process id>)

Get the next file from a process. Get a file -- you actually get a URL you can use to fetch the file. If there are no files left, you get an empty string.

get-next-file options:

--timeout=TIMEOUT   Timeout in seconds waiting for file. 
-1 to disable it,
0 to return immediately if no file.
The default is one hour.

samweb [base options] list-projects [command options]

List projects by various query parameters, such as showing projects you have run, or are running.

list-projects options:

--name=NAME         
--user=USER         
--group=GROUP       
--defname=DEFNAME   
--snapshot_id=SNAPSHOT_ID
--started_before=STARTED_BEFORE
--started_after=STARTED_AFTER
--ended_before=ENDED_BEFORE
--ended_after=ENDED_AFTER
--state=STATE
--station=STATION

samweb [base options] prestage-dataset [command options]

Prestage a dataset. This is new in v1_7.

prestage-dataset options:
--defname=DEFNAME
--snapshot_id=SNAPSHOT_ID
--max-files=MAX_FILES
--station=STATION
--parallel=PARALLEL
Number of parallel processes to run
--delivery-location=DELIVERY_LOCATION
Location to which the files should be delivered
(defaults to the same as the node option)
--node=NODE The current node name. The default is the local
hostname, which is appropriate for most situations

samweb [base options] project-recovery [command options] <project name>

Display the dimensions for the recovery dataset for a project. This command generates a dataset definition of files that failed being processed in a given project. You can say whether to count it as failed depending on if the file said it was processed, or if the overall process failed.

project-recovery options:

--useFileStatus=USEFILESTATUS
Use the status of the last file in a process
--useProcessStatus=USEPROCESSSTATUS
Use the process status

samweb [base options] project-summary [command options] <project name>

Display the summary information for a project such as what processes there were, what files got delivered, etc.

project-summary options:

--station=STATION

samweb [base options] release-file [command options] (<process url> | <project url> <process id>) <file name>

Release a file from a process. Once SAM is done with a file, this must be done before calling get-next-file again.

release-file options:

--status=STATUS

samweb [base options] run-project [command options] <command to run (%fileurl will be replaced by file url)>

Run a project. This is an all-in-one command that starts a project and associated processes, and stops the project when done. (This is new in v1_7.)

run-project options:
--defname=DEFNAME
--snapshot_id=SNAPSHOT_ID
--max-files=MAX_FILES
--station=STATION
--name=NAME Project name
--schemas=SCHEMAS Comma separated list of url schemas this process
prefers to receive
--parallel=PARALLEL
Number of parallel processes to run
--delivery-location=DELIVERY_LOCATION
Location to which the files should be delivered
(defaults to the same as the node option)
--node=NODE The current node name. The default is the local
hostname, which is appropriate for most situations
--quiet

samweb [base options] set-process-status [command options] (<process url> | <project name or url> [process id]) <status>

Set the process status.

set-process-status options:

--description=DESCRIPTION

samweb [base options] start-process [command options] <project name or url>

Start a consumer process within a project. This command is used to deliver files for a given project. It gets the consumer process id number, which is needed for get-next-file, release-file, set-process-status, etc.

start-process options:

--appfamily=APPFAMILY
--appname=APPNAME   
--appversion=APPVERSION
--url               Return the entire process url rather than just the process id.
--node=NODE         The current node name. The default is the local hostname, which is appropriate for most situations
--delivery-location=DELIVERY_LOCATION
Location to which the files should be delivered (defaults to the same as the node option.)
--max-files=MAX_FILES
Limit the maximum number of files to give to the process
--description=DESCRIPTION
Text description of the process
--schemas=SCHEMAS   Comma separated list of url schemas this process prefers to receive

samweb [base options] start-project [command options] [project name]

Start a new project. The user gives this command a dataset definition to snapshot, and SAM puts the files into an area and delivers the files when asked. Or the users can give SAM the snapshot_id and it uses that instead of a dataset definition.

start-project options:

--defname=DEFNAME   
--snapshot_id=SNAPSHOT_ID
--group=GROUP       
--station=STATION

samweb [base options] stop-process [command options] (<process url> | <project url> <process id>)

End an existing process.

samweb [base options] stop-project [command options] <project name>

Stop a running project. Usually because everyone is done, but possibly because you want to not do any remaining jobs that haven't finished.

stop-project options:

--station=STATION

Utility commands

samweb [base options] file-checksum [command options] <path to file> [<path to file> [...]]

Calculate a checksum for a local file on disk using the enstore algorithm (sometimes inaccurately described as a "CRC"). Compute a checksum for a file so you can put it into the metadata, or check it against the metadata previously created.

samweb [base options] server-info [command options]

Display information about the server, such as the samweb service, client version, etc.

Admin commands

The main purpose of administration commands is to ensure SAM knows about the information that users want tracked. If an administrator wants SAM to track a particular host, the administrator must tell SAM about the host. SAM automatically adds information where possible (with more automation being added as SAM evolves), but administrators need to use the following commands, especially when using an automated script, like when adding dozens of users.

The experiment-specific Definition Editor linked at: https://cdcvs.fnal.gov/redmine/projects/sam/wiki/User_Guide_for_SAM#Definition-Editor may be more useful to add a single user, or a small number of users.

samweb [base options] add-application [command options] <family> <name> <version>

Add a new application to the database.

samweb [base options] add-data-disk [command options] <mount point>

Add a new data disk.

samweb [base options] add-parameter [command options] <category.name> <data type>

Add new parameter

samweb [base options] add-user [command options] <username>

Add new user.

add-user options:

--first-name=FIRST_NAME
--last-name=LAST_NAME
--email=EMAIL       
--uid=UID           
--groups=GROUPS

samweb [base options] add-value [command options] <see --help-categories>

Add value to the database.

add-value options:

--help-categories   list the database categories that can be used

samweb [base options] describe-user [command options] <username>

List user information.

samweb [base options] list-applications [command options]

List defined applications.

list-applications options:

--family=FAMILY     
--name=NAME         
--version=VERSION

samweb [base options] list-data-disks [command options]

List defined data disks.

samweb [base options] list-parameters [command options] [category.name]

With no arguments, list the defined parameters. If a single argument is provided, list all the values for that parameter name.

samweb [base options] list-users [command options]

List registered users

samweb [base options] list-values [command options] <see --help-categories>

List values from the database.

list-values options:

--help-categories   list the database categories that can be used

samweb [base options] modify-user [command options] <username>

Modify user.

modify-user options:

--email=EMAIL       
--groups=GROUPS Set the user's groups to this comma separated list
--addgroups=ADDGROUPS
Add the comma separated list of groups to the user
--status=STATUS
--addgridsubject=ADDGRIDSUBJECT
A grid subject to add to the user
--removegridsubject=REMOVEGRIDSUBJECT
A grid subject to remove from the user