FIFE Data Handling » History » Version 7

« Previous - Version 7/10 (diff) - Next » - Current version
Kenneth Herner, 05/09/2017 12:36 AM

FIFE Data Handling


This is a synopsis of the full data handling documents

Kinds of data access

Computing jobs need to access various kinds of data, which we will attempt to outline here.

  1. Executables/libraries -- jobs need to access the actual code which will execute
  2. Conditions data -- Calibration information, beam status information, etc. is generally kept in a database, and jobs need a way to access the data that will not overload the databases.
  3. Input Files -- should be transferred in a manner that doesn't pollute caches and be obtained from a SAM-like data handling system that provides data files in an order that can be retrieved efficiently
  4. Output files -- should be returned from the job, possibly to a location where they can be automatically registered in the data handling system
  5. Logging/Monitoring -- information about job status should be communicated back to a central location to assist with monitoring.

Storage resources

This is an executive summary of data resources, with some common characteristics

For illustration, we refer to a project named hypot

RESOURCE Net capacity Net data rate File size Access limits Interfaces Comments
Bluearc app Few TB .5 GB/sec any none for common files NFS /hypot/app, /grid/fermiapp/hypot For executables, libraries, small common files
Bluearc data 240 TB per vol .5 GB/sec 1 MB block 5 files at once per project NFS /hypot/data /grid/data/hypot, FTP For unmanaged project and cache, use ifdh cp on grid
DCache 3 PB Multi GB/sec 1 MB block automatic, hundreds ? NFS (SLF6+), dccp, webdav, FTP, xroot etc. For managed files, non-scratch files backed to Enstore
Enstore 10+ PB Multi GB/sec 2+ GB access via DCache DCache


Use ifdh cp or fetch to move data to and from local disk on worker nodes
  • <50 GB local disk per job
  • See Auxiliary File task force for advice on highly shared files
  • ifdh also works on OSG
Use Dcache for managed and high througput files
  • archival - /pnfs/hypot/data
  • scratch - /pnfs/hypot/data/scratch/users/...
  • directly available to SLF6.4+ clients, with NFS 4.1
Use Bluearc for temporary user analysis files ( project disk )
  • /hypot/data


Directly write or read Bluearc /hypot/data
  • Limited disk heads per array, O(10s)
  • Limited bandwidth, O(1 GByte/sec)
  • Direct access by grid jobs at best slows everyone down drastically, producing alarms, idle grid slots and sad interactive users.
  • At worst this can crash the Bluearc servers.

Try to edit or rewrite DCache files, it won't work


Where possible, web interfaces which could take advantage of GRID squid caches, etc. should be used.

Data Type Tool
Executables CVMFS
Conditions NuConDB
File metadata samweb
Input ifdh
Output ifdh/FTS
Logging ifdh/numsg

Fermigrid Bluearc Unmount Task Force

There have been ongoing issues with Bluearc overloads
due to accidental direct access to Bluearc file systems from Fermigrid jobs.
There is a short term Sep/Oct 2014 Fermigrid Bluearc Unmount Task Force
preparing plans for eliminating these overloads.

Access Methods to dCache for Interactive use

There are several access methods for interactive use of dCache files. These include: DCap, dccp, and srm, and gridftp. Currently gridftp is the preferred method, and the default for our "ifdh cp" utility, which is the recommended tool for getting files in and out of dcache for experimenters.


Gridftp is the underlying file transfer mechansim used by SRM. Using it directly reduces some copy connection overhead imposed by SRM.

The ifdh utility, in the ifdhc ups product, is the recommended tool for doing Gridftp copies for Fermilab experiments, and gridftp is currently the default transfer mechanism for copies in and out of dcache.

ifdh cp /pnfs/nova/scratch/users/mengel/test.txt /tmp/localfile.txt

One can also give full gsiftp: URI's for specifying grifdtp servers, for example:


Note that our current dcache configuration hides the first 4 components of the /pnfs/<experiment-name>/... path when you do gridftp access,(assuming the Grid proxy you are using is mapped in the usual fashion).

nfs v4.1

On NFSV4.1 mounted filesystem you can do anything you normally do except modifying file content.

mount -v -t nfs4 -o minorversion=1 localhost:/pnfs /pnfs/fs

Can then do commands like cp, rm, and so on.

For more information, please look at:


Web Distributed Authoring and Versioning (WebDAV) is an extension of the Hypertext Transfer Protocol (HTTP) that allows users to create and modify web content. Many operating systems provide built-in client support for WebDAV. To browse namespace and download data, the user directs a web browser to (This is read only.)

To access the data, the user needs to generate grid certificate proxy like so:
$ grid-proxy-init
Your identity: /DC=org/DC=doegrids/OU=People/CN=Dmitry Litvintsev 257737
Enter GRID pass phrase for this identity:
Creating proxy .......................................... Done
Your proxy is valid until: Tue Feb 12 04:37:20 2013

Use the following curl command to put/get data using WebDAV door:
  1. example of put
    $ curl -L --capath /etc/grid-security/certificates \
    --cert /tmp/x509up_u8637 -T /etc/fstab
  1. example of get
    $ curl -L --capath /etc/grid-security/certificates
    --cert /tmp/x509up_u8637 \\
    -o curl1.txt
    % Total % Received

More information is available at:;filename=webdav.pdf;version=2


DCap provides POSIX-like open, create, read, write and lseek functions to the dCache storage. In addition there are some specific functions for setting debug level, getting error messages, and binding the library to a network interface. The dCap protocol requires specification of the dCache server host, port number, and domain, in addition to the inclusion of "/usr" ahead of the storage group designation in the PNFS path. Its structure is shown here:


See for usage information.


The dccp command provides a cp-like functionality on the PNFS file system and has the following syntax:

% dccp [ options ] source_file [ destination_file ]

The options and command usage are described at Note that on systems where PNFS is mounted via NFS 4.1, dccp will not work properly. In that case, just use cp or ifdh cp.


SRM is middleware for managing storage resources on a grid. The SRM implementation within the dCache manages the dCache/Enstore system. It provides functions for file staging and pinning2, transfer protocol negotiation and transfer url resolution.

The ifdh utility, in the ifdhc ups product, is the recommended tool for doing SRM copies for Fermilab experiments. SRM is not currently the default
protocol for ifdh cp, so you need to specify it with a --force option to use it:

ifdh cp --force=srm /pnfs/nova/scratch/users/mengel/test.txt /tmp/localfile.txt

You can also give a full SRM protocol URI, used for the remote file specification, which requires the SRM server host, port number, and domain. For the domain, the inclusion of "/usr" ahead of the storage group designation in the PNFS path is also required. Its structure is shown here:

srm://<serverHost>:<portNumber>/service/path?SFN=/<root of fileSystem>/<storage_group>[/usr]/<filePath>

The first two examples are for the domain, the third for


For details, please see: