- Table of contents
- SAM schemas and urls
- Copying root files using xrdcp
- Streaming root files using xrootd
- Streaming vs. copying in batch jobs
- Xrootd and IFDH environment variables
SAM schemas and urls¶
The term sam schema refers to the different types of urls that can be used to specify the locations of files in a grid-accessible way. Grid urls for a specific file can be determined using samweb command "
samweb get-file-access-url." Use option "
--schema" to request a specific schema. For example,
$ samweb get-file-access-url --schema=root PhysicsRun-2018_6_19_18_30_46-0017269-00167_20180723T111421_bnb_2.root root://fndca1.fnal.gov:1094/pnfs/fnal.gov/usr/uboone/data/uboone/swizzled/tpc/prod_v06_26_01_20/swizzle_crt_merge_v2/swizzle/00/01/72/69/PhysicsRun-2018_6_19_18_30_46-0017269-00167_20180723T111421_bnb_2.root $ samweb get-file-access-url --schema=gsiftp PhysicsRun-2018_6_19_18_30_46-0017269-00167_20180723T111421_bnb_2.root gsiftp://fndca1.fnal.gov:2811/pnfs/fnal.gov/usr/uboone/data/uboone/swizzled/tpc/prod_v06_26_01_20/swizzle_crt_merge_v2/swizzle/00/01/72/69/PhysicsRun-2018_6_19_18_30_46-0017269-00167_20180723T111421_bnb_2.root
The two commonly used schemas supported by the MicroBooNE sam station are "
root" (aka xrootd) and "
gsiftp" (aka gridftp).
Copying root files using xrdcp¶
The root command "
xrdcp" can be used to copy xrootd urls to or from local (posix) files. Command
xrdcp has many options, but is usually just invoked like this:
xrdcp <source> <dest>
The source and destination can be either xrootd urls or local files. If either endpoint is an xrootd url, you will need to have a grid proxy to authenticate to the xrootd server.
Streaming root files using xrootd¶
Xrootd urls can be opened and accessed directly from root. This means that data will be transmitted over the network from the xrootd server directly to the streaming process.
An xrootd url can opened for reading using either of the following methods.
TFile* f = TFile::Open(url) TFile* f = new TXNetFile(url)
Obviously, the first form is preferred, as it will work for both xrootd urls and local files. The syntax "
TFile* f = new TFile(url)" does not work with xrootd urls. You will need a grid proxy for authentication.
Streaming vs. copying in batch jobs¶
As already noted, the MicroBooNE sam station returns xrootd urls by default. If you are reading root files in a batch job using the art framework, art's default behavior is to stream xrootd urls returned by sam. Streaming normally works fine, but there are cases where streaming is inadvisable. Copying may be preferred in the following cases.
- The input file is not a root file.
- At sites with slow or unstable internet connections.
- With jobs that take a long time to process files. It is known that if a batch job is trying to process multiple files, and if processing time exceeds 20 minutes per file, open attempts for later files will fail.
There are a couple of ways to force batch jobs to copy root files instead of streaming them. One way is to override the deafult schema and specify schema "
gsiftp" (gridftp files are always copied). Another way is to set run time environment variable
jobsub_submit option "
Specifying the sam schema in sam projects and batch jobs¶
The MicroBooNE sam station is configured to return xrootd urls by default. You can override the default for a particular sam project or batch job. The schema is specified by each process (i.e. batch job) that joins a sam project. The schema is specified as an option of command "
samweb start-process" or "
ifdh establishProcess." You only need to know this if you are writing your own batch scripts. If you are submitting batch jobs using
project.py, you can specify the schema using stage element "
<schema>," like this
<stage> <schema>root</schema> </stage>
<stage> <schema>gsiftp</schema> </stage>
Use schema "
gsiftp" to force copying, or if the input file is not a root file.
Use environment to tell art to copy xrootd urls¶
You can tell art to copy xrootd urls, rather than streaming them, by setting run time environment variable
IFDH_COPY_XROOTD=1. If submitting a batch job, you should add the
jobsub_submit ... -e IFDH_COPY_XROOTD=1
If submitting batch jobs using
project.py, add this jobsub_submit option in xml stage element
<stage> <jobsub>-e IFDH_COPY_XROOTD=1</jobsub> </stage>
Xrootd and IFDH environment variables¶
The environment variable
IFDH_COPY_XROOTD is one of many environment variables that modify the behavior or
ifdh (either inside the art framework, or when invoked by command line).
ifdhc wiki includes this article giving all environment variables used by
ifdh. Here are some environment variables that have been found to be useful in certain situations.
IFDH_COPY_XROOTD=1 IFDH_CP_UNLINK_ON_ERROR=1 IFDH_CP_MAXRETRIES=n IFDH_DEBUG=1
Xrootd has its own set of environment variables that can affect its behavior. The following environment variables have been recommended by dCache experts for accessing files in dCache using xrootd. Setting these is believed to reduce the number of glitchy xrootd failures.
XRD_CONNECTIONRETRY=32 XRD_REQUESTTIMEOUT=3600 XRD_REDIRECTLIMIT=255
A reference giving all xrootd client environment variables, as well as other client documentation, can be found here