Project

General

Profile

Ifdh v2 x config file

In version v2_0_x and later, ifdhc uses a config file to tell it how to handle I/O.

Finding the config files

ifdh looks in $IFDHC_CONFIG_DIR/ifdh.cfg (in the ups package ifdh_config) and failing that in $IFDHC_DIR/ifdh.cfg.

Config file format

The config file is in an .ini file format.

How the file is used

Besides a globals and experiment_vo section, the config file has basically three layers of items; prefixes, locations, and protocols.

Prefixes are strings matched against arguments to ifdh cp, ifdh ls, etc. to determine what
location they are associated with.

Each location has a list of supported protocols, and a string prefix to use for each such protocol.

Finally protocols have a list of commands and associated flags that say how we actually do work on that protocol.

So ifdh processes arguments by matching prefixes, associating locations, picking protocols, and building protocol appropriate paths by putting prefixes back on. So for example /pnfs/nova/foo.txt would be matched against a [prefix /pnfs] entry which tells it this should map to [location dcache_stken] which supports multiple protocols; ifdh will pick one, (in this case, say "gsiftp:") , and find that it should pull the /pnfs/ off the front and put gsiftp://fndca1.fnal.gov/pnfs/fnal.gov/usr/ on the front and then look in the [protocol gsiftp:] section to find out he actual copy, ls, etc. command to use.

Types of stanzas

The config file has only a few types of stanzas, with specific entries in them.

The [macros] section

Each entry in this section is a macro for a string that can be replaced elsewhere in the config file. For example, if you have

[macros]
SLOC=srm://some-really-long-hostname.big.domain.name:2939/path/to/whatever?SRF=

Then you could use %(SLOC)s throughout the config file as a shorthand for that really long url.

The [experiment_vo] section

This section has a series of rules of regular expressions and replacments to map your
experiment to a VO string for voms-proxy-init, etc. It has a few items:

  • numrules= sets how many match/repl sets there will be
  • match1=, repl1= regular expression to match and replacement to use (in the style of s/x/y/ in perl)
  • match2=, repl2=... etc. up to nrules entries.

The [general] section

This section lists items which will have sections later in the file:

  • conditionals= is a list of conditional tests that will be defined later in the config
  • protocols= gives a list of protocols which will be defined later in the config
  • prefixes= gives a list of prefixes which we'll be worrying about

the [conditional xxx] sections

For each conditional xxx there is a [conditional xxx] section, which currently supports two tags:

  • test= the test to perform, at the moment "-x /some/path" is supported, to check if a given path is executable.
  • rename_proto= this is followed by two protocol names, and if the test (above) passes, the first one is renamed to the second one (generally replacing an existing entry). This lets us behave differently, say, on MacOS, or on nodes that have the OSG version 3 tools rather than the OSG version 2 ones.

the [prefix yyy] sections

For each prefix yyy in our global.prefixes list, there should be a [prefix yyy] section in the config file. The prefix itself is a regular expression that is matched against the path, anchored at the front; if it matches the current path/url, then it is applied (first match in the global.prefixes list wins). The section contains:

  • location= the location name this prefix is associated with
  • slashstrip= how many slashes (and intervening text) to strip from the url or path when using the locations prefexis for different protocols.

the [location zzz] sections

For each location zzz referred to in a [prefix ...] stanza, we need an [location zzz] section, which provides:

  • protocols= list of protocols by which this location can be reached.
  • need_cpn_lock= flag (1 or 0) saying whether we need a CPN lock to use this location
  • prefix_proto= for each of the protocols in our protocols= list, we need a prefix (which may be blank) to stitch on the front when reaching items in this location via that protocol

the [protocol xxx:] sections

For each protocol xxx in our global.protocols list, we should have a [protocol xxx] stanza, with the following items set:

  • need_proxy= A flag to say whether we need a Grid/VOMS proxy to use this protocol
  • strip_file_prefix= A flag to say whether we should strip file://// off of local filesreferred to in these commands.
  • extra_env Name of environment variable to check for extra flags to pass to this stansas copy command.
  • cp_cmd= Copy commandds for this protocol, use %(src)s and %(dst)s for source and destination parameters, and %(extra)s for extra evironment varible flags.
  • lss_cmd command to run to get a directory listing includng file size
  • lss_skip number of lines of lss_cmd output to skip (headers, etc.)
  • lss_size_last the size is the last field matched, not the first
  • lss_re_1 regexp to try to match against ls lines of output from lss_cmd
  • ll_cmd command to run for a long listing
  • mv_cmd sets the rename command
  • chmod_cmd sets the chmod command; it can have %(mode)s for the octal mode, or %(rwxbits) for "rwx" flags.
  • mkdir_cmd "" "" mkdir " "
  • rm_cmd
  • rmdir_cmd

Note that a few of the commands (particularly the [protocol D0:] section) use bash-isms to strip prefixes from shell variables like ${variablename#prefix} or ${variablename%suffix}. See the bash(1) manpage for more details.

A simple example configuration

Lets say we only have three places to worry about, Fermilab DCache, and BNL's DCache, and we only want to talk to them over gridftp. You could setup a configuration that knows about 3 prefixes,

[general]
conditionals=
prefixes=/pnfs/fnal.gov /pnfs/usatlas.bnl.gov /
protocols=file: gsiftp:

[experiment_vo]
numrules=5
match1=(lbne|cdf|lsst|fermilab|dune|des)
repl1=$1:/$1
match2=dzero
repl2=$1:/$1/users
match3=mars(.*)
repl3=fermilab:/fermilab/mars/$1
match4=samdev
repl4=fermilab:/fermilab
match5=(.*)
repl5=fermilab:/fermilab/$1

[prefix /pnfs/fnal.gov]
location=dcache_stken
slashstrip=0

[prefix /pnfs/usatlas.bnl.gov/]
location=dcache_bnl
slashstrip=0

[prefix /]
location=local_fs
slashstrip=0

It would then go on to define those locations:

[location dcache_stken]
protocols=gsiftp:
prefix_file=
prefix_gsiftp=gsiftp://fndca1.fnal.gov:2811/

[location dcache_bnl]
protocols=gsiftp:
prefix_file=
prefix_gsiftp=gsiftp://dcgftp.usatlas.bnl.gov:2811/

[location local_fs]
protocols=file:
prefix_file=

And finally we would say how we get things done when doing file: and gsiftp: operations

[protocol gsiftp:]
need_proxy=1
extra_env=IFDH_GRIDFTP_EXTRA
extra_env2=IFDH_GSIFTP_EXTRA
strip_file_prefix=0
cp_cmd=globus-url-copy -rst-retries 1 -gridftp2 -nodcau -restart -stall-timeout 
14400 %(extra)s %(src)s %(dst)s
cp_r_cmd=globus-url-copy -cd -rst-retries 1 -gridftp2 -nodcau -restart -stall-ti
meout 14400 -r %(extra)s %(src)s/ %(dst)s/
lss_cmd=uberftp -ls %(src)s
lss_skip=0
lss_size_last=0
lss_dir_last=0
lss_re1 = ([-dl])[-rwxs]{9}\s*[a-zA-Z0-9_]*\s*[a-zA-Z0-9]*\s*[a-zA-Z0-9_]*\s*([0-9]*)\s[A-Z].{11}\s*(_*)([^/]*)$
lss_re2 = ([-dl])[-rwxs]{9}\s*[a-zA-Z0-9_]*\s*[a-zA-Z0-9_]*\s*[a-zA-Z0-9_]*\s*([0-9]*)\s*[A-Z].{11}\s*(/.*/)([^/]*)$
ll_cmd=uberftp -ls %(src)s
mv_cmd=dst="%(dst)s";  uberftp -rename %(src)s /${dst\#*://*/}
chmod_cmd=uberftp -chmod %(mode)s  %(src)s
mkdir_cmd=uberftp -mkdir  %(src)s
rm_cmd=uberftp -rm  %(src)s
rmdir_cmd=uberftp -rmdir  %(src)s

[protocol file:]
strip_file_prefix=1
need_proxy=0
extra_env=IFDH_DD_EXTRA
extra_env_2=IFDH_CP_EXTRA
cp_cmd=dd bs=512k %(extra)s if=%(src)s of=%(dst)s
cp_r_cmd=cp -r %(extra)s %(src)s %(dst)s
lss_size_last=0
lss_dir_last=1
lss_skip=0
lss_cmd=find %(src)s -maxdepth 1 \( -type d -printf '%s %p/\n' -o -printf '%s %p\n' \) 
ll_cmd=find %(src)s -maxdepth 1 -ls
lss_re1 = \s*([0-9]+)\s(/.*/)(.*)(/)$
lss_re2 = \s*([0-9]+)\s(/.*/)(.*)()$
mv_cmd=mv %(src)s %(dst)s
chmod_cmd=chmod %(mode)s %(src)s
mkdir_cmd=mkdir %(src)s
rm_cmd=rm %(src)s
rmdir_cmd=rmdir %(src)s

The real configuration

The production configuration currently supports several protocols and numerous locations, with more to be added. The latest configuratino is at source:ifdh.cfg