Project

General

Profile

FIFE Data Handling » History » Version 1

Katherine Lato, 05/27/2014 12:12 PM

1 1 Katherine Lato
h1. FIFE Data Handling
2 1 Katherine Lato
3 1 Katherine Lato
h2. Overview
4 1 Katherine Lato
5 1 Katherine Lato
This is a synopsis of the full data handling documents
6 1 Katherine Lato
* "(2014) FIFE Data Architecture ":http://cd-docdb.fnal.gov/cgi-bin/ShowDocument?docid=5180
7 1 Katherine Lato
* "(2011) Intensity Frontier Computing Model and GRID Tools":http://cd-docdb.fnal.gov/cgi-bin/ShowDocument?docid=4082 
8 1 Katherine Lato
9 1 Katherine Lato
h2. Kinds of data access
10 1 Katherine Lato
11 1 Katherine Lato
p. Computing jobs need to access various kinds of data, which we will attempt to outline here.
12 1 Katherine Lato
13 1 Katherine Lato
# Executables/libraries -- jobs need to access the actual code which will execute
14 1 Katherine Lato
# Conditions data -- Calibration information, beam status information, etc. is generally kept in a database, and jobs need a way to access the data that will not overload the databases.
15 1 Katherine Lato
# Input Files -- should be transferred in a manner that doesn't pollute caches and be obtained from a SAM-like data handling system that provides data files in an order that can be retrieved efficiently
16 1 Katherine Lato
# Output files -- should be returned from the job, possibly to a location where they can be automatically registered in the data handling system
17 1 Katherine Lato
# Logging/Monitoring -- information about job status should be communicated back to a central location to assist with monitoring.
18 1 Katherine Lato
19 1 Katherine Lato
h2. Storage resources
20 1 Katherine Lato
21 1 Katherine Lato
This is an executive summary of data resources, with some common characteristics
22 1 Katherine Lato
23 1 Katherine Lato
| RESOURCE | Net capacity | Net data rate | File size | Access limits | Interfaces | Comments |
24 1 Katherine Lato
| Bluearc app | Few TB | .5 GB/sec | any | none for common files | NFS /<proj>/app, /grid/fermiapp/<proj> | For executables, libraries, small common files |
25 1 Katherine Lato
| Bluearc data | 240 TB per vol | .5 GB/sec | 1 MB block | < 10 files at once | NFS /*/data /grid/data/*, FTP | For unmanaged project and cache, use ifdh cp on grid |
26 1 Katherine Lato
| DCache | 3 PB | Multi GB/sec | 1 MB block | automatic, hundreds ? | NFS (SLF6+), dccp, webdav, FTP, xroot etc. | For managed files, non-scratch files backed to Enstore|
27 1 Katherine Lato
| Enstore | 10+ PB | Multi GB/sec | 2+ GB | access via DCache | DCache | |
28 1 Katherine Lato
29 1 Katherine Lato
30 1 Katherine Lato
h2. Interfaces
31 1 Katherine Lato
32 1 Katherine Lato
Where possible, web interfaces which could take advantage of GRID squid caches, etc. should be used.
33 1 Katherine Lato
34 1 Katherine Lato
| Data Type     | Tool           |
35 1 Katherine Lato
| Executables   | CVMFS          | 
36 1 Katherine Lato
| Conditions    | NuConDB        |
37 1 Katherine Lato
| File metadata | samweb         |
38 1 Katherine Lato
| Input         | ifdh           |
39 1 Katherine Lato
| Output        | ifdh/FTS       |
40 1 Katherine Lato
| Logging       | ifdh/numsg     |
41 1 Katherine Lato
42 1 Katherine Lato
h2. Access Methods to dCache for Interactive use
43 1 Katherine Lato
44 1 Katherine Lato
There are several access methods for interactive use of dCache files. These include: DCap, dccp, and srm, and gridftp.  Currently gridftp is the preferred method, and the default for our "ifdh cp" utility, which is the recommended tool for getting files in and out of dcache for experimenters.
45 1 Katherine Lato
46 1 Katherine Lato
h3. gridftp
47 1 Katherine Lato
48 1 Katherine Lato
Gridftp is the underlying file transfer mechansim used by SRM.  Using it directly reduces some copy connection overhead imposed by SRM.
49 1 Katherine Lato
50 1 Katherine Lato
The ifdh utility, in the ifdhc ups product, is the recommended tool for doing Gridftp copies for Fermilab experiments, and gridftp is currently the default transfer mechanism for copies in and out of dcache.
51 1 Katherine Lato
52 1 Katherine Lato
ifdh cp /pnfs/nova/scratch/users/mengel/test.txt /tmp/localfile.txt
53 1 Katherine Lato
54 1 Katherine Lato
One can also give full gsiftp: URI's for specifying grifdtp servers, for example:
55 1 Katherine Lato
56 1 Katherine Lato
   @gsiftp://fndca1.fnal.gov/scratch@
57 1 Katherine Lato
   @gsiftp://fg-besman1.fnal.gov/grid/data@
58 1 Katherine Lato
59 1 Katherine Lato
Note that our current dcache configuration hides the first 4 components of the /pnfs/fnal.gov/usr/<experiment-name>/... path when you do gridftp access,(assuming the Grid proxy you are using is mapped in the usual fashion).
60 1 Katherine Lato
61 1 Katherine Lato
h3. nfs v4.1 
62 1 Katherine Lato
63 1 Katherine Lato
On NFSV4.1 mounted filesystem you can do anything you normally do except modifying file content.
64 1 Katherine Lato
65 1 Katherine Lato
@mount  -v -t nfs4 -o minorversion=1 localhost:/pnfs /pnfs/fs@
66 1 Katherine Lato
67 1 Katherine Lato
Can then do commands like cp, rm, and so on.
68 1 Katherine Lato
69 1 Katherine Lato
For more information, please look at:
70 1 Katherine Lato
https://srm.fnal.gov/twiki/bin/view/DcacheCorner/DcacheFAQ
71 1 Katherine Lato
72 1 Katherine Lato
h3. Webdav 
73 1 Katherine Lato
74 1 Katherine Lato
Web Distributed Authoring and Versioning (WebDAV) is an extension of the Hypertext Transfer Protocol (HTTP) that allows users to create and modify web content. Many operating systems provide built-in client support for WebDAV. To browse namespace and download data, the user directs a web browser to https://fndca4a.fnal.gov:2880. (This is read only.)
75 1 Katherine Lato
76 1 Katherine Lato
To access the data, the user needs to generate grid certificate proxy like so:
77 1 Katherine Lato
$ grid-proxy-init
78 1 Katherine Lato
Your identity: /DC=org/DC=doegrids/OU=People/CN=Dmitry Litvintsev 257737
79 1 Katherine Lato
Enter GRID pass phrase for this identity:
80 1 Katherine Lato
Creating proxy .......................................... Done
81 1 Katherine Lato
Your proxy is valid until: Tue Feb 12 04:37:20 2013
82 1 Katherine Lato
83 1 Katherine Lato
Use the following curl command to put/get data using WebDAV door:
84 1 Katherine Lato
# example of put
85 1 Katherine Lato
$ curl -L --capath /etc/grid-security/certificates \
86 1 Katherine Lato
--cert /tmp/x509up_u8637 -T /etc/fstab
87 1 Katherine Lato
https://fndca4a.fnal.gov:2880/fermigrid/volatile/fermilab/litvinse/curl.txt
88 1 Katherine Lato
89 1 Katherine Lato
# example of get
90 1 Katherine Lato
$ curl -L --capath /etc/grid-security/certificates
91 1 Katherine Lato
--cert /tmp/x509up_u8637 \
92 1 Katherine Lato
https://fndca4a.fnal.gov:2880/fermigrid/volatile/fermilab/litvinse/curl.txt\
93 1 Katherine Lato
-o curl1.txt
94 1 Katherine Lato
% Total % Received
95 1 Katherine Lato
96 1 Katherine Lato
More information is available at:
97 1 Katherine Lato
http://cd-docdb.fnal.gov/cgi-bin/RetrieveFile?docid=5050;filename=webdav.pdf;version=2
98 1 Katherine Lato
99 1 Katherine Lato
h3. DCap
100 1 Katherine Lato
101 1 Katherine Lato
DCap provides POSIX-like open, create, read, write and lseek functions to the dCache storage. In addition there are some specific functions for setting debug level, getting error messages, and binding the library to a network interface. The dCap protocol requires specification of the dCache server host, port number, and domain, in addition to the inclusion of "/usr" ahead of the storage group designation in the PNFS path. Its structure is shown here:
102 1 Katherine Lato
103 1 Katherine Lato
dcap://<serverHost>:<port>/</pnfs>/<storage_group>/usr/<filePath> 
104 1 Katherine Lato
105 1 Katherine Lato
See http://www-dcache.desy.de/manuals/libdcap.html for usage information.
106 1 Katherine Lato
107 1 Katherine Lato
h3. dccp
108 1 Katherine Lato
109 1 Katherine Lato
The dccp command provides a cp-like functionality on the PNFS file system and has the following syntax:
110 1 Katherine Lato
111 1 Katherine Lato
% dccp [ options ] source_file [ destination_file ] 
112 1 Katherine Lato
 
113 1 Katherine Lato
The options and command usage are described at http://www-dcache.desy.de/manuals/dccp.html.
114 1 Katherine Lato
115 1 Katherine Lato
h3. srmcp
116 1 Katherine Lato
117 1 Katherine Lato
SRM is middleware for managing storage resources on a grid. The SRM implementation within the dCache manages the dCache/Enstore system. It provides functions for file staging and pinning2, transfer protocol negotiation and transfer url resolution.
118 1 Katherine Lato
119 1 Katherine Lato
The ifdh utility, in the ifdhc ups product, is the recommended tool for doing SRM copies for Fermilab experiments.  SRM is not currently the default
120 1 Katherine Lato
protocol for ifdh cp, so you need to specify it with a --force option to use it:
121 1 Katherine Lato
122 1 Katherine Lato
@ifdh cp --force=srm /pnfs/nova/scratch/users/mengel/test.txt /tmp/localfile.txt@
123 1 Katherine Lato
124 1 Katherine Lato
You can also give a full SRM protocol URI, used for the remote file specification, which  requires the SRM server host, port number, and domain. For the fnal.gov domain, the inclusion of "/usr" ahead of the storage group designation in the PNFS path is also required. Its structure is shown here:
125 1 Katherine Lato
126 1 Katherine Lato
@srm://<serverHost>:<portNumber>/service/path?SFN=/<root of fileSystem>/<storage_group>[/usr]/<filePath>@
127 1 Katherine Lato
 
128 1 Katherine Lato
The first two examples are for the fnal.gov domain, the third for cern.ch:
129 1 Katherine Lato
130 1 Katherine Lato
   @ srm://fndca1.fnal.gov:8443/srm/managerv2?SFN=/pnfs/fnal.gov/usr/nova/scratch@
131 1 Katherine Lato
   @ srm://cdfdca1.fnal.gov:8443/srm/managerv2?SFN=/pnfs/fnal.gov/usr/cdfen/filesets/<filePath>@
132 1 Katherine Lato
   @ srm://wacdr002d.cern.ch:9000/castor/cern.ch/user/<filePath> @
133 1 Katherine Lato
134 1 Katherine Lato
For details, please see:
135 1 Katherine Lato
http://www.fnal.gov/docs/products/enstore/enstore_may04/usingdcache.html#8346