Project

General

Profile

Basics with SAM dCache and Tape

Tape and dCache

All recorded data, and all official MC production, is stored on tape for long term storage. However, it is not possible to access the tapes directly. When you try to access a file, a robot is dispatched to retrieve the tape that file is on and copy the file from the tape onto a disk system. That system is called dCache. All the directories which start with /pnfs are in dCache. Most of the files in dCache are in the tape-backed areas, which means when you to access the file, it may not actually be immediately available.

You can determine if a file is available in the cache with the following command:

$  cache_state.py /pnfs/dune/tape_backed/dunepro/protodune/np04/beam/output/detector/decoded-raw/05/48/34/76/np04_raw_run004604_0002_dl3_decoder_11787052_0_20180924T035135.root

NOT CACHED

There are some exceptions. These areas do not refer to files which are backed up by tape. These are also the areas that general users are able to write to.

/pnfs/dune/scratch
/pnfs/dune/persistent

The files in those areas are always immediately accessible. The scratch area is large but not unlimited in size; files do not stay there forever. Least recently used (read, not just "touched") files are deleted from scratch as space is needed for new files. The size of the scratch area is such that files have a lifetime of about 30 days. The persistent area keeps files permanently, but it is of limited size.

SAM

SAM is a database which knows about filenames, their locations, and some metadata associated with them.

You construct queries of that database to find files with match certain criteria (say run_number). Here is a simple, but likely useful example:

$ samweb list-files "run_number 4604 and data_tier raw with limit 4" 
np04_raw_run004604_0003_dl6.root
np04_raw_run004604_0003_dl3.root
np04_raw_run004604_0001_dl10.root
np04_raw_run004604_0002_dl3.root

It also knows about the relationships between files, e.g. which were the inputs to produce others:

$ samweb file-lineage children np04_raw_run004604_0002_dl3.root
np04_raw_run004604_0002_dl3_reco_11787052_0_20180924T035135.root
np04_raw_run004604_0002_dl3_decoder_11787052_0_20180924T035135.root

SAM is a powerful tool, and can do much more than shown here. See other documentation here.

Working with files in SAM and dCache

If you want to use a file in SAM which is stored in dCache, you should first check to make sure it is in the cache. If not, your command will likely hang for a very long time waiting for it to be retrieved.

$ cache_state.py np04_raw_run004604_0003_dl6_decoder_12052903_0_20181007T022804.root

NOT CACHED

Sam provides tools for ensuring that a file is cached before you try to work with it. Those tools only work on datasets, rather than individual files, so you'll need to create a dataset for your file:

$ samweb create-definition ahimmel_protodune_pds_1 "file_name np04_raw_run004604_0003_dl6_decoder_12052903_0_20181007T022804.root" 
$ samweb list-definition-files ahimmel_protodune_pds_1
np04_raw_run004604_0003_dl6_decoder_12052903_0_20181007T022804.root

Then issue this command to prestage the file:

$ samweb prestage-dataset --defname=ahimmel_protodune_pds_1
Started project ahimmel_protodune_pds_1_prestage_20181114114652
Started consumer processs ID 2432199

It may run for a long time (minutes to hours, depending on how busy the tape system is).

Once the command finishes, the file is in the cache and you can use it. The simplest way if you are passing the file to ROOT or ART is to use samweb2xrootd:

lar -c myfhicl.fcl `samweb2xrootd np04_raw_run004604_0003_dl6_decoder_12052903_0_20181007T022804.root`