Project

General

Profile

File Transfer Service

The File Transfer Service (FTS) is a server process that watches directories for the arrival of new files, and when it discovers them, tries to extract file metadata, add it to the SAM catalogue, and then transfer, archive, and delete the file according to its configuration.

Installation

The FTS is distributed either as a ups product or a set of Python binaries, and depends on having a properly configured sam web server for the experiment.

If data files are required to be copied anywhere, then the FTS depends on the sam_cp product being installed and configured. If files are going to be archived to tape then the pnfs file system must be mounted on the host and the encp product installed.

Installing Python Binary Wheel Distributions
  • Create a Python 2.7 virtual environment.
    virtualenv my_fts_env
  • Activate and upgrade your virtual environment:
    cd my_fts_env; source bin/activate; pip install --upgrade pip setuptools wheel
  • Download the tarball for your desired version into your virtual environment: https://cdcvs.fnal.gov/redmine/projects/filetransferservice/files
  • Extract the binaries:
    tar xzvf fts_<your downloaded version>.tar.gz
  • Finally, install all of the binaries into your virtual environment:
    pip install --no-index --find-links=./fts_dist -r ./fts_dist/requirements.txt

External dependencies

Version 3.5 and below require the openssl098e compatibility RPM installed on SLF6 systems.

Configuration

All parameters are set in the configuration file.

Starting and stopping it

To start the service

$ setup FileTransferService
$ start_fts <run dir> <config file>

The run dir must be a directory for which the process running the FTS has write access.

To stop the service

$ stop_fts <run dir>

Nova FTS Guide

Path Templates

The template contains palce holders specified by ${...}

The content can be a simple field from the metadata, a category.param name from the metadata
or the special values run_number, run_type, app_name, app_family, app_version, year, month, day

Numeric values may be further qualified by the operators % (modulus) or / (division)
Finally there can be a length field in square brackets at the end. If the value is prefixed by '='
then it is treated as an exact length , otherwise it is a minimum. If the length is followed by '/' and a value
then the value is split into chunks of that size, separated by / characters

Examples:
If the run number is 123456

${run_number} gives 123456
${run_number[8]} gives 00123456
${run_number/100[6]} gives 001234
${run_number[2]} gives 123456
${run_number[=2]} gives 56
${run_number[8/2]} gives 00/12/34/56

New Production Manager Documentation

This document is intended to ease new experiment production coordinators into their new role by introducing the Fermi File Transfer Service (FTS) and Sequential Access via Metadata CoPy (SAM_CP) utilities, which are central to experiment data management.

The following services must be configured for your experiment:

1. SAM for your experiment (by submitting a request to Scientific Data Management)

Once these services have been set up by the appropriate groups, it’s time to configure FTS.

FTS

The configuration file provided to and loaded by FTS on startup controls the operation of the FTS instance. It is divided into sections, only the first of which, [main] is fundamentally required. This section handles configuration of required system options such as:

[main]
experiment = {your experiment name}
log-file = {path to the FTS log file destination}
filetypes = {space delimited list of filetype stanzas to be defined later in this file}
local-db = {where to save the local sqlite database that }
enable-web-interface = {True enables the FTS monitoring page}
web-interface-port = {Monitoring page port}
allowed-web-ip = {Use this to limit access to machines on the FNAL network}
transfer-retries = {Total number of retries before a transfer fails}
transfer-retry-interval = {Interval between retries}
scanner-queue-limit = (Maximum number of queued transfers before file discovery is halted)
scanner-max-limit = (Maximum number of uncompleted transfers before file discovery is halted)

This main settings block is followed by any number of filetype blocks that correspond to the file types declared in the [main] filetype list. These file types are where the real functionality of the FTS system lies. With these filetype stanzas, any number of different file types may be defined, each with different rules assigned to them, greatly easing the burden of data management.

[filetype mc1]
scan-dirs = {path to the directory to be monitored for new files}
scan-interval = {time between discovery scans in seconds}
scan-delay = {minimum age of a file before it is picked up the file discovery}
scan-file-patterns = {regular expression designating the files discovered by the scanner}
scan-exclude-file-patterns = {regular expression for explicitly ignoring files}
extract-metadata = {True or False depending on if metadata should be extracted by a plugin provided by SDM}
transfer-to = {Template for the path files will be stored at. See Path Templates in the full documentation}
erase-after-days = {Time (in days) before files verified as on tape are deleted}

Full documentation on the FTS including all configuration options may be found here:
https://cdcvs.fnal.gov/redmine/projects/filetransferservice/wiki

SAM_CP

SAM_CP is the software that translates filesystem paths for transfer start and end locations provided in the FTS configuration file into appropriate formats for different transfer software, and is also responsible for actually initiating the copy. By default, SAM_CP invokes ifdh_cp to perform the copy operation via GridFTP, however this may be overridden in lieu of using other transfer software and protocols. Should this need to be performed, it will require usage of setting IFDH_FOCE to your preferred protocol in the ‘envset’ directive (See documentation for more details).

The general format of a SAM_CP configuration file begins with a [sam_cp] stanza that generally looks as such

[sam_cp]
logfile=/var/tmp/sam_cp_$USER.log
debug=0

Followed by a number of stanzas denoting transformations on source and destination paths:

[data_src]
srcre: /pnfs/myexperiment/
srcrepl: gsiftp://fndca1.fnal.gov/pnfs/fnal.gov/usr/myexperiment/

This will match to any source in the FTS config file that begins with /pnfs/myexperiment/ and replaces it with the string designated in srcrepl, which is suitable for GridFTP file transfers.

[data_dst]
dstre: (enstore|dcache):/pnfs/myexperiment/
dstrepl: gsiftp://fndca1.fnal.gov/pnfs/fnal.gov/usr/myexperiment/

Similarly, this matches to the same thing for destinations with one exception, this will match to strings beginning with either enstore or dcache, allowing for files to be flexibly transferred to either tape or to disk.

Full documentation on SAM_CP explaining its full utility may be found here: https://cdcvs.fnal.gov/redmine/projects/sam-cp/wiki/Reference