Project

General

Profile

Using DUNE's dCache Scratch, Persistent, and Tape-Backed Space at Fermilab

THIS PAGE HAS MOVED TO the DUNE wiki

Request access to the wiki by sending an email to .












dCache Space

For temporary storage, you can put files in scratch space.

     /pnfs/dune/scratch/users/<makeyourowndirectory>

There is no limit to how much data or how many files you can put in the scratch area. We share the total scratch pool (about 1 PB) with all other Fermilab experiments. As of 2015, the lifetime of files in the scratch area is approximately one month. An automatic process will clean up old files, which keeps the most recently used files, and deleting ones not used. A histogram of the time since last file use for files in scratch is kept updated at http://fndca.fnal.gov/dcache/lifetime//PublicScratchPools.jpg . The age of files to be deleted next can be read off of this plot. You can find the last-use time with ls -lu. Updating the timestamp with the touch command is not sufficient to update the last use time, and files that are merely repeatedly touched in this way will still be evicted.

For persistent storage, you can put files in

     /pnfs/dune/persistent/users/<makeyourowndirectory>

There is also a /pnfs/lbne/ area, and users on the dunegpvm machines have read-write access to it. It has scratch and persistent areas as well, and the persistent disk space is shared with /pnfs/dune. It is kept around because many files were written before DUNE was established, and 35-ton is still writing data to this space.

There is also tape-backed storage:

     /pnfs/dune/tape_backed/users/<makeyourowndirectory>

In the tape_backed area, files can be accessed just as they are in the scratch or persistent areas, but the access time may be long if a file is on tape and must be copied to disk before accessing it. Furthermore, a file which has been copied to the tape_backed area (and not just mv'd there, see the warning below -- update: mv should be disabled from non-tape-backed storage to tape-backed storage), may not be immediately copied to tape, as it must wait behind other tape requests. To see if a file in the tape_backed area is on tape or on disk, here's an example:

   cat /pnfs/dune/tape_backed/<directory>/".get(<file_name>)(locality)" 
   for example
   cat ".(get)(myfile.root)(locality)" 

The meaning of the possible values of the file locality is given in the following table:

file locality Meaning
ONLINE the file is only on disk
ONLINE_AND_NEARLINE the file is on disk and on tape
NEARLINE the file is only on tape
UNAVAILABLE the file is unavailable -- for example, it is not on tape and the pool where it is located is down

Files that are read off of tape go into the read-write pool in dCache and are cleaned out using a LRU algorithm (least recently read). File lifetimes can be seen at this link: http://fndca.fnal.gov/dcache/lifetime/readWritePools.jpg

In the future (but not the case as of October 2016, DUNE must make a request), small files are automatically and transparently aggregated in order to optimize tape usage (tape markers between files make storing many small files on tape inefficient without aggregation). But the small-file aggregator doesn't know about your use patterns, in that you may want to access many similar small files all together. So it is a good idea to tar up many small files yourself in logical groupings in order to optimize the access.

Warning about mv'ing files from scratch to persistent areas -- as the file is not rewritten, it retains its deletion policy. A file may then get removed from the persistent area if it started out in scratch and mv'd there. If you need to transfer a file from scratch to persistent, ifdh cp it and delete the scratch version. The same is true when mv'ing files between the non-tape-backed areas and the tape-backed areas: the file retention policies are set when the file is created, and are not changed by mv'ing it to a new directory location. Update -- mv is now disabled between dCache volumes with different retention policies. You get an "Operation not permitted" error message. Don't try making a hard link between areas with different retention policies -- I haven't tested it and it shouldn't work.

Persistent dCache Monitoring

DUNE's persistent dCache space is divided into lbne and dune areas which share a common
total allocation.

Usage for all groups: http://fndca3a.fnal.gov/cgi-bin/du_cgi.py

Usage by user for /pnfs/dune/persistent: http://fndca.fnal.gov/cgi-bin/space_usage_by_user_cgi.py?key=dune

Usage by user for /pnfs/lbne/persistent: http://fndca.fnal.gov/cgi-bin/space_usage_by_user_cgi.py?key=lbne

http://fndca3a.fnal.gov:2288/webadmin/poolgroups?0

Or just the top-level monitoring and documentation page

http://fndca3a.fnal.gov

The histories of these are maintained by FIFEMON. See the DUNE history at:

https://fifemon-pp.fnal.gov/dashboard/db/dcache-persistent-usage-by-vo?var-VO=dune

Tape Montoring links:

Storage and I/O Rates are available here:

http://archive.fnal.gov/dune/

Small-file aggregation monitoring (experts):

http://www-stken.fnal.gov/cgi-bin/enstore_sfa_hud_cgi.py

Tape Mover Queues

http://fndca3a.fnal.gov:2288/queueInfo

dCache File Transfer Monitoring

https://fifemon.fnal.gov/monitor/dashboard/db/dcache-transfer-overview?from=now-6h&to=now

File lifetime monitoring in the dCache tape-backed pool

http://fndca.fnal.gov/dcache/lifetime/readWritePools.jpg

DUNE's FTS Server:

http://dunesamgpvm02.fnal.gov:8787/fts/status

Accessing dCache Space

The most reliable ways of accessing dCache space -- scratch, persistent, and tape-backed -- are through the use of the FIFE tool ifdh and also xrootd. See the links below on best practices.

With SLF6, which uses NFSV4.1, you can access files interactively in dCache directly using regular POSIX access -- ls, cp, rm, etc. all ought to work normally. Unfortunately, the reliability of POSIX access has been found to be not perfect. scp of big files from faraway network locations has been known to fail with input/output errors. Simply copying files from one dCache location to another using cp has been known to hang the cp, requiring a restart of the dCache NFS server. This has also been seen with rsync.

If a job takes some time between writing one record and the next, and is using NFS direct access to dCache, then one can get a bad file descriptor error message on a write to a file that takes place too long after the previous write.

A note about using rsync to copy things to tape-backed dCache: even when it succeeds, it can cause performance problems when used with tape-backed dCache space. rsync checks whether a file needs to be transferred first before transferring it, and it needs to open both the local and remote copies of each file to see if it needs updating. If the destination file is on tape, it must be staged to disk before rsync can even check it. So even if rsync would skip over a set of files as they already exist on tape, it still stages them all in, one at a time, causing lots of tape mounts, seeks, rewinds, and dismounts.

Data corruption has been observed when using NFSV4.1 to create files in dCache, say with cp from another source. Data corruption has been observed even when the cp command succeeds with a return code of 0 and the destination file has the right size. Experts recommend checking the checksums (see below) of files created in dCache using the NFSV4.1 interface. The dCache maintainers state that /pnfs is not a fully compliant POSIX filesystem.

For a detailed description of other file access protocols -- DCAP, GridFTP, xrootd, and best-practices information, see CD DocDB 5399

See also CD DocDB 5583 for discussions of best practices: Link to CD DocDB 5583

See this presentation for best practices and a description of what's under the hood, as to why some usage patterns will be more efficient and more reliable than others: https://indico.fnal.gov/getFile.py/access?contribId=30&resId=0&materialId=slides&confId=9737

For batch access, please use ifdh cp to copy files to and from dCache, or stream your data on input using xrootd. IFDH and xrootd require a grid proxy, which is set up in a job submitted with jobsub, but if you need to run ifdh interactively, here's how to get your grid proxy, which uses your (unexpired) Kerberos ticket to construct a CILogon certificate, and from there create a proxy file:

kx509
export EXPERIMENT=dune
export ROLE=Analysis
voms-proxy-init -rfc -noregen -voms dune:/dune/Role=$ROLE -valid 24:00

Kerberos tickets typically have a 26-hour validity time and are forwarded from your personal computer when you log in to an interactive server with ssh -K. The command klist will display your tickets and their expiration times. If your Kerberos ticket has expired, you may have to log in again to get a fresh one.

More information is available in the data handling presentation at the May 2017 collaboration meeting: https://indico.fnal.gov/getFile.py/access?contribId=66&sessionId=9&resId=0&materialId=slides&confId=12345

Dmitry Litvinsev gave a very thorough presentation on dCache at the 2016 FIFE workshop: https://indico.fnal.gov/getFile.py/access?contribId=17&resId=1&materialId=slides&confId=12120

Checking ADLER32 checksums

Here's an e-mail from Dmitry Litvinsev on the subject of checking checksums using the one automatically computed for all files in dCache.
If you would like to check for file corruption, you need to compute the checksum of the original file before it got into dCache with the one
in dCache. Instructions from Dimitry:

Apparently there is xrdadler32 utility that comes from:

[root@fnisd1 ~]# rpm -q --whatprovides /usr/bin/xrdadler32
xrootd-client-4.2.3-1.osg32.el6.x86_64

So then the workflow must be:

1) calculate checksum on source :

xrdadler32 <source file>

2) perform "cp" 

3) exract destination CRC :

cat "/pnfs/path/.(get)(<file name>)(checksum)" 

parse it, as it looks like:
[root@stkensrv1n ~]# cat /pnfs/fs/usr/dune/tape_backed/mc_backup/dunepro/v06_02_00/mergeana/prodgenie_nue_dune10kt_1x2x6/12930790_45/".(get)(prodgenie_nue_dune10kt_1x2x6_45_20160812T003615_merged.root)(checksum)" 
ADLER32:0c31b5b4

4) compare the one calculated at (1) and the one extracted at (3). If they match - success, else remove destination and retry.

Common Gotchas

  • Failing to put the -D on an ifdh cp statement when the source or the destination is a directory. The error message will look something like this:

    <dunegpvm08.fnal.gov> ifdh cp /tmp/trj/foo.txt /tmp/trj/foocopy/
    dd: opening `/tmp/trj/foocopy/': Is a directory
    program: dd bs=512k  if=/tmp/trj/foo.txt of=/tmp/trj/foocopy/exited status 1
    delaying 50 ...
    ^C
    

    which doesn't look like an error, but the problem is a missing -D.
  • Attempting to copy a non-existent file with ifdh cp will not immediately fail, like cp does, but instead will keep retrying until your job runs out of time or you run out of patience interactively, or if the source file gets created.

Setting IFDH_CP_MAXRETRIES=2 in your job submission environment can help debug a job that hangs because it failed to produce the desired output file and ifdh cp just retried forever, causing the job to be held and the logfile to be lost.

  • Writing files directly to dCache using the NFS protocol will fail with a bad file descriptor if the records are written with long time delays between records. The problem here is that files become immutable in dCache after a finite time, even in directories that are not tape-backed. To create a new file in dCache, write it first on some local disk, such as BlueArc (/dune/data or /dune/data2), or $_CONDOR_SCRATCH_DIR, and then ifdh cp it to dCache.

Older Out-Of-Date, Obsolete Instructions: Copying files to and from dCache

You will need to be a member of the LBNE Virtual Organization. Follow the
instructions for getting grid permissions to become a member:
Grid Setup Instructions

Instructions on how to copy from and write to dCache.  The exmaple
below was run February 2014 on lbnegpvm02, and the username is qzli
in order to refer to specificfiles on BlueArc and the dCache user scratch area.

As of Febraury 2014, the /pnfs/lbne/scratch/users is 6 PB big and is shared
by all Intensity Frontier experiments.  It is not backed up by tape -- please
use SAM for that.  Once SL6 is available, we can use NFS directly instead of
ifdh to access this disk.

  How to copy files to and from dCache.  Files can be directly transferred to dCache
from batch jobs without going through BlueArc first.

 *** SET THE WORKING ENVIRONMENT ***

> export ROLE=Analysis
> export EXPERIMENT=lbne
> kx509
> voms-proxy-init -rfc -noregen -voms lbne:/lbne/Role=$ROLE -valid 24:00
> voms-proxy-info -all

> . /grid/fermiapp/products/common/etc/setups.sh
> setup ifdhc
> which ifdh
 (make sure your ifdhc is from v1_3_1 or higher)

*** COPY FROM dCache ***

> TESTFILE=/pnfs/lbne/mc/lbne/simulated/001/lbne_test_file.root
> ifdh cp "${TESTFILE}" "/lbne/data/users/qzli/TESTFILE.root" 
> ls -l /lbne/data/users/qzli/TESTFILE.root

 *** COPY TO dCache ***

> ifdh mkdir /pnfs/lbne/scratch/users/qzli
> ifdh cp /lbne/data/users/qzli/TESTFILE.root /pnfs/lbne/scratch/users/qzli/
TESTFILE.root
> ls -l /pnfs/lbne/scratch/users/qzli/TESTFILE.root

Using xrootd to analyze files in dCache without copying them locally

More recent instructions (2016) are available at: https://indico.fnal.gov/getFile.py/access?contribId=17&resId=1&materialId=slides&confId=12120

Here's an old example browsing a rootfile in dCache. Note the different pnfs directory name (lbne -- make sure you use dune now).


> kx509
> voms-proxy-init -rfc -noregen -voms lbne:/lbne/Role=Analysis
> root root://fndca1.fnal.gov:1094/pnfs/fnal.gov/usr/lbne/scratch/users/qzli/TESTFILE.root

Links to documentation:

http://www.fnal.gov/docs/products/enstore/PublicdCacheHowTo.html

http://trac.dcache.org/wiki/xrootd

https://srm.fnal.gov/twiki/bin/view/DcacheCorner/DcacheFAQ

See also LBNE DocDB 9657

dCache files can be accessed via posix interfaces on SL6, running NFSV4

Tips on Querying dCache Tags

NOvA's page on dCache access