Project

General

Profile

Configuration file

The configuration file follows INI syntax, with sections in square brackets, and values as 'key = value' pairs.

[main]

Parameter Description Default value
debug-mode Performs fake file transfer for debugging purposes. Internal use only. False
log-file The path to the log file. The name given will have the date and .log appended. The pattern ${hostname) will be substituted with the hostname. run_dir/twistd.log
filetypes A space separated list of file types to consider. Each file type has a separate configuration section controlling its actions. Empty
samweb-url The URL of the samweb server (example: https://samweb.fnal.gov:8483/sam/nova/dev/api)
x509-client-certificate Path to an x509 certificate $X509_USER_CERT if defined, else /tmp/x509up_u<user id>
x509-client-key Path to an x509 key file (must not require a password) $X509_USER_KEY if defined, else the client certificate file
transfer-retries The number of times to retry failed transfers. Unlimited
transfer-retry-interval The period in seconds before retrying a failed transfer. 30 secs
transfer-retry-max-interval Stop retries after this interval regardless of "transfer-retries" setting. 3600
max-transfer-limit Maximum concurrent transfers of all file types 100 (15 before 6.1.0)
transfer-limits Maximum transfers to a specific destination. A space separated list of <system>:<limit> pairs, for example enstore:25 dcache:25. Added in 6.1.0 25
disable-web-admin-interface Disables web interface "retry" button. False
enable-web-interface If true, then enable the web monitoring interface. False
external-web-url Web interface URL http://local_fqdn:8888
web-interface-port The port to listen on for web requests. 8888
allowed-web-ip Only allow connections from these IP addresses. Use * as a wildcard
http-server-syslog Send web server log information to the syslog. False
webdav-path-pattern A pair of strings used to remap paths to a webdav URL. The first is a regex, the second the substitution text. (Added in v3_11)
local-db Path to a database file to store local state information. The pattern ${hostname) will be substituted with the hostname. (Added in v3_11)
plugin-paths Path to directory containing plugin code
plugins Space separated list of plugins to activate (plugin metadata extractors do not need to be listed here)
experiment Explicitly specify your experiment. $SAM_EXPERIMENT or $EXPERIMENT or unknown respectively.
service-name This is used to make an FTS instance unique, especially if you have multiple ones running on the same node. experiment-fts-machine_name
sam-web-registry-base-url The URL pointing to the sam web registry which is used to self register the FTS service. http://samweb.fnal.gov:8480/sam_web_registry
graphite-base-url OBSOLETE The URL pointing to graphite which is used for monitoring. To be used in conjunction with service-name.
graphite-stats-server [protocol:]<hostname>:<port> for graphite server. protocol defaults to 'pickle' if not given. Set it to 'plaintext' to use the simple text protocol. (Added in v6_0_0 - the protocol option)
scanner-max-limit Maximum uncompleted transfers before scanner suspends. Includes queued, in progress, and waiting for tape transfers. None
scanner-queue-limit Maximum queued transfers before scanner suspends. Includes queued and in progress transfers. None
webdav-path-pattern Specify webdav path as <regular-expression. <replacemnt> pair None

[filetype <type>]

General:

Parameter Description Default value
move-completed A directory to move files to when they have been successfully completed. The path may be a path template (see below). None
move-failed A directory to move files to when they have failed. Files with multiple destinations, some of which succeeded and at least one of which failed, are considered to have failed. The path may be a path template (see below). None
erase-after-days Delete files older than this that have been successfully processed (fractional days are allowed). None
disable-wait-for-tape Do not wait for tape label
tape-wait-timeout-days Consider transfer failed if no tape label within this interval 4
erase-verify-checksum Erase local file if checksum is verified
erase-check-crc Synonym for erase-verify-checksum
erase-immediately Erase local file immediately after transfer.
erase-archived-on-tape-only Normally only files that have been archived to tape will be deleted. If False this parameter will also consider them safe to delete if they have a disk location. (Added in v3_7) True
erase-archived-merged-descendants If True, the children of files considered for deletion will be checked for archived merged tar files. (Added in v3_7) False
priority Relative priority for this file type to be entered into the transfer queue. Higher values have greater priority. (Only affects the scanner phase; does not affect the priority of the physical transfers.) 0

Directory scanning:

Parameter Description Default value
scan-dirs A space separated list of directories to scan for new files. The entire directory tree is scanned recursively; symlinks will be followed, but there is no checking for loops. Inaccessible directories are skipped. Empty
scan-interval The interval, in seconds, between directory scans. The timer starts when the pervious scan is completed. Setting this value to zero means the directory will be scanned once at startup, and not subsequently
scan-delay The minimum age, in seconds, for a file before it will be picked up by the scanner. Files with a modification date younger than this are silently ignored (but may be picked up on a subsequent pass). 0
scan-file-patterns A space separated list of shell glob expressions which the filename must match. Empty
scan-exclude-file-patterns A space separated list of shell glob expressions which are used to exclude files. If a file matches both scan-file-patterns and scan-exclude-file-patterns it is excluded Empty
skip-zero-length-files Do not consider zero length files for transfer

Metadata:

Parameter Description Default value
extract-metadata If true then try to create metadata for discovered files that are not already in the database. If false, no metadata will be extracted, so files that are not already catalogued cannot proceed. true
metadata-extractor The name of the metadata extractor to invoke on discovered files, for example json-file
extract-checksum If false then a checksum will not be calculated from the file. If the storage system provides a checksum value then this will be added to the database instead. true
checksum-algorithms Comma separated list of checksum algorithms. Valid values are enstore,adler32, any hash provided by python hashlib, any crc provided by crcmod (if installed). If the source is on dCache and the algorithm matches dCache's checksum then this will be used directly; otherwise the entire file must be read. (Added in 6.0.0) emstore
extract-crc Synonym for extract-checksum
group Group setting for metadata if not already specified in metadata.

File transfers:

Transfers to more than one location are possible. To specify additional destinations, the options below can optionally suffixed by -1, -2, etc. So transfer-to specifies the first location, transfer-to-1 the second, and so on.

Parameter Description Default value
transfer-to The directory path to transfer these files to. This must be in SAM location format (node:/absolute/path - the node is not necessarily a physical hostname). The path may be a template - see below. Empty
transfer-delay-time Delay the start of transfers until a time boundary. This is used to group transfers together, for example to reduce the number of tape mounts. The time is in seconds, so for example a value of 3600 will delay transfers until the start of the next hour. 0
transfer-limit Maximum concurrent transfers of this file type. Deprecated from 6.1.0; use transfer-limits instead 5

Transfer timeout

transfer-timeout Kill transfer if not completed in this time 43200 (3600 prior to version 6.1.0)

Merging:

Parameter Description Default value
do-merge Boolean flag for whether to merge files or not. False
merged-output-dir Directory in which to write the merged file
min-merged-size-MB Minimum archive file size to create. None
max-merged-size-MB Maximum archive file size to create. None
max-merged-files The maximum number of files to put in a single archive (providing it reaches the minimum size requirement No limit
max-merged-age-sec The maximum time, in seconds, unmerged files will be left sitting in the merge queue. If this time is reached, they will be merged even if the max size requirement has not been reached. 24 hours
compress-merged-file Boolean flag to compress the resulting tarball. False

[plugin <plugin name>]

This section contains plugin specific configuration values.

Path templates

Path templates allow creating paths depending on the file metadata. Metadata components are specified by elements like '${parameter}'. 'year', 'month', and 'day' are parameters that use the file times in the metadata if present, or the file modification time if not, and so will always work. SAM metadata parameters are specified as '<category>.<parameter>'. Numeric values may be modified by '/' for division, '%' for the modulus, and by appending [n] in order to pad the value to the specified size. For example, if a file has the metadata parameters '{Online: {Runnumber: 123456, Stream: all}}' then the template

/path/to/${online.runnumber/100[8]}/${online.runnumber%100[2]}/${Online.stream}

will generate the path

/path/to/00001234/56/all

Setting the length field to [n=] will force a numeric value to exactly that width, so this example could instead be written

/path/to/${online.runnumber/100[8]}/${online.runnumber[=2]}/${Online.stream}

Finally, as a special rule primarily intended for conveniently splitting up run numbers, the length can be given as [n/m]. This will pad the length to a minimum of n characters, then split it into chunks of size m, with slashes between. So

/path/to/${online.runnumber[8/2]}/${Online.stream}

gives

/path/to/00/01/23/45/67/all

Most keys are the same as given in the metadata. Some have special forms. Runs use: run_number, subrun_number, run_type . If multiple runs/subruns are present in the metadata the first is used. Application information uses app_family, app_name and app_version. There are also some keys relating to the source path: srcpath refers to the full directory path of the file; basepath and relpath refer to the scan directory and the path relative to the scan directory respectively (so if the scanner is looking at /input and the file is located in /input/some/subdir/ then ${srcpath} is /input/some/subdir, ${basepath} is /input and ${relpath} is some/subdir) (basepath and relpath require v5_0_0 or newer)