Project

General

Profile

SAM schemas and urls

The term sam schema refers to the different types of urls that can be used to specify the locations of files in a grid-accessible way. Grid urls for a specific file can be determined using samweb command "samweb get-file-access-url." Use option "--schema" to request a specific schema. For example,

$ samweb get-file-access-url --schema=root PhysicsRun-2018_6_19_18_30_46-0017269-00167_20180723T111421_bnb_2.root
root://fndca1.fnal.gov:1094/pnfs/fnal.gov/usr/uboone/data/uboone/swizzled/tpc/prod_v06_26_01_20/swizzle_crt_merge_v2/swizzle/00/01/72/69/PhysicsRun-2018_6_19_18_30_46-0017269-00167_20180723T111421_bnb_2.root
$ samweb get-file-access-url --schema=gsiftp PhysicsRun-2018_6_19_18_30_46-0017269-00167_20180723T111421_bnb_2.root
gsiftp://fndca1.fnal.gov:2811/pnfs/fnal.gov/usr/uboone/data/uboone/swizzled/tpc/prod_v06_26_01_20/swizzle_crt_merge_v2/swizzle/00/01/72/69/PhysicsRun-2018_6_19_18_30_46-0017269-00167_20180723T111421_bnb_2.root

The two commonly used schemas supported by the MicroBooNE sam station are "root" (aka xrootd) and "gsiftp" (aka gridftp).

Copying root files using xrdcp

The root command "xrdcp" can be used to copy xrootd urls to or from local (posix) files. Command xrdcp has many options, but is usually just invoked like this:

xrdcp <source> <dest>

The source and destination can be either xrootd urls or local files. If either endpoint is an xrootd url, you will need to have a grid proxy to authenticate to the xrootd server.

Streaming root files using xrootd

Xrootd urls can be opened and accessed directly from root. This means that data will be transmitted over the network from the xrootd server directly to the streaming process.

An xrootd url can opened for reading using either of the following methods.

TFile* f = TFile::Open(url)
TFile* f = new TXNetFile(url)

Obviously, the first form is preferred, as it will work for both xrootd urls and local files. The syntax "TFile* f = new TFile(url)" does not work with xrootd urls. You will need a grid proxy for authentication.

Streaming vs. copying in batch jobs

As already noted, the MicroBooNE sam station returns xrootd urls by default. If you are reading root files in a batch job using the art framework, art's default behavior is to stream xrootd urls returned by sam. Streaming normally works fine, but there are cases where streaming is inadvisable. Copying may be preferred in the following cases.

  • The input file is not a root file.
  • At sites with slow or unstable internet connections.
  • With jobs that take a long time to process files. It is known that if a batch job is trying to process multiple files, and if processing time exceeds 20 minutes per file, open attempts for later files will fail.

There are a couple of ways to force batch jobs to copy root files instead of streaming them. One way is to override the deafult schema and specify schema "gsiftp" (gridftp files are always copied). Another way is to set run time environment variable IFDH_COPY_XROOTD=1 (use jobsub_submit option "-e IFDH_COPY_XROOTD=1."

Specifying the sam schema in sam projects and batch jobs

The MicroBooNE sam station is configured to return xrootd urls by default. You can override the default for a particular sam project or batch job. The schema is specified by each process (i.e. batch job) that joins a sam project. The schema is specified as an option of command "samweb start-process" or "ifdh establishProcess." You only need to know this if you are writing your own batch scripts. If you are submitting batch jobs using project.py, you can specify the schema using stage element "<schema>," like this

<stage>
  <schema>root</schema>
</stage>

or
<stage>
  <schema>gsiftp</schema>
</stage>

Use schema "gsiftp" to force copying, or if the input file is not a root file.

Use environment to tell art to copy xrootd urls

You can tell art to copy xrootd urls, rather than streaming them, by setting run time environment variable IFDH_COPY_XROOTD=1. If submitting a batch job, you should add the jobsub_submit option.

jobsub_submit ... -e IFDH_COPY_XROOTD=1

If submitting batch jobs using project.py, add this jobsub_submit option in xml stage element <jobsub>.
<stage>
  <jobsub>-e IFDH_COPY_XROOTD=1</jobsub>
</stage>

Xrootd and IFDH environment variables

The environment variable IFDH_COPY_XROOTD is one of many environment variables that modify the behavior or ifdh (either inside the art framework, or when invoked by command line).

The ifdhc wiki includes this article giving all environment variables used by ifdh. Here are some environment variables that have been found to be useful in certain situations.

IFDH_COPY_XROOTD=1
IFDH_CP_UNLINK_ON_ERROR=1
IFDH_CP_MAXRETRIES=n
IFDH_DEBUG=1

Xrootd has its own set of environment variables that can affect its behavior. The following environment variables have been recommended by dCache experts for accessing files in dCache using xrootd. Setting these is believed to reduce the number of glitchy xrootd failures.

XRD_CONNECTIONRETRY=32
XRD_REQUESTTIMEOUT=3600
XRD_REDIRECTLIMIT=255

A reference giving all xrootd client environment variables, as well as other client documentation, can be found here