Project

General

Profile

FIFE Data Handling » History » Version 10

Kenneth Herner, 11/09/2020 02:33 PM

1 1 Katherine Lato
h1. FIFE Data Handling
2 1 Katherine Lato
3 1 Katherine Lato
h2. Overview
4 1 Katherine Lato
5 1 Katherine Lato
This is a synopsis of the full data handling documents
6 1 Katherine Lato
* "(2014) FIFE Data Architecture ":http://cd-docdb.fnal.gov/cgi-bin/ShowDocument?docid=5180
7 1 Katherine Lato
* "(2011) Intensity Frontier Computing Model and GRID Tools":http://cd-docdb.fnal.gov/cgi-bin/ShowDocument?docid=4082 
8 1 Katherine Lato
9 1 Katherine Lato
h2. Kinds of data access
10 1 Katherine Lato
11 1 Katherine Lato
p. Computing jobs need to access various kinds of data, which we will attempt to outline here.
12 1 Katherine Lato
13 1 Katherine Lato
# Executables/libraries -- jobs need to access the actual code which will execute
14 1 Katherine Lato
# Conditions data -- Calibration information, beam status information, etc. is generally kept in a database, and jobs need a way to access the data that will not overload the databases.
15 1 Katherine Lato
# Input Files -- should be transferred in a manner that doesn't pollute caches and be obtained from a SAM-like data handling system that provides data files in an order that can be retrieved efficiently
16 1 Katherine Lato
# Output files -- should be returned from the job, possibly to a location where they can be automatically registered in the data handling system
17 1 Katherine Lato
# Logging/Monitoring -- information about job status should be communicated back to a central location to assist with monitoring.
18 1 Katherine Lato
19 1 Katherine Lato
h2. Storage resources
20 1 Katherine Lato
21 1 Katherine Lato
This is an executive summary of data resources, with some common characteristics
22 1 Katherine Lato
23 6 Arthur Kreymer
For illustration, we refer to a project named *<code>hypot</code>*
24 6 Arthur Kreymer
25 1 Katherine Lato
| RESOURCE | Net capacity | Net data rate | File size | Access limits | Interfaces | Comments |
26 6 Arthur Kreymer
| Bluearc app | Few TB | .5 GB/sec | any | none for common files | NFS /hypot/app, /grid/fermiapp/hypot | For executables, libraries, small common files |
27 9 Joe Boyd
| Bluearc data | 240 TB per vol | .5 GB/sec | 1 MB block | 5 files at once per project | NFS /hypot/data /grid/data/hypot, FTP | For unmanaged project and cache |
28 1 Katherine Lato
| DCache | 3 PB | Multi GB/sec | 1 MB block | automatic, hundreds ? | NFS (SLF6+), dccp, webdav, FTP, xroot etc. | For managed files, non-scratch files backed to Enstore|
29 1 Katherine Lato
| Enstore | 10+ PB | Multi GB/sec | 2+ GB | access via DCache | DCache | |
30 6 Arthur Kreymer
31 6 Arthur Kreymer
h3. DO
32 6 Arthur Kreymer
33 6 Arthur Kreymer
Use ifdh cp or fetch to move data to and from local disk on worker nodes
34 10 Kenneth Herner
* remember files copied in (as opposed to streamed with xrootd) count against your local disk usage in the batch job.
35 6 Arthur Kreymer
* See Auxiliary File task force for advice on highly shared files
36 6 Arthur Kreymer
* ifdh also works on OSG
37 6 Arthur Kreymer
38 6 Arthur Kreymer
Use Dcache for managed and high througput files
39 6 Arthur Kreymer
* archival - /pnfs/hypot/data
40 6 Arthur Kreymer
* scratch  - /pnfs/hypot/data/scratch/users/...
41 6 Arthur Kreymer
* directly available to SLF6.4+ clients, with NFS 4.1
42 6 Arthur Kreymer
43 6 Arthur Kreymer
Use Bluearc for temporary user analysis files ( project disk )
44 6 Arthur Kreymer
* /hypot/data
45 6 Arthur Kreymer
46 6 Arthur Kreymer
h3. DO NOT
47 6 Arthur Kreymer
48 10 Kenneth Herner
Write or read Bluearc disks such as /hypot/data or /hypot/app from grid jobs; they are not accessible.
49 6 Arthur Kreymer
50 9 Joe Boyd
Try to edit or rewrite DCache files, it won't work
51 1 Katherine Lato
52 1 Katherine Lato
h2. Interfaces
53 1 Katherine Lato
54 1 Katherine Lato
Where possible, web interfaces which could take advantage of GRID squid caches, etc. should be used.
55 1 Katherine Lato
56 1 Katherine Lato
| Data Type     | Tool           |
57 1 Katherine Lato
| Executables   | CVMFS          | 
58 1 Katherine Lato
| Conditions    | NuConDB        |
59 1 Katherine Lato
| File metadata | samweb         |
60 1 Katherine Lato
| Input         | ifdh           |
61 1 Katherine Lato
| Output        | ifdh/FTS       |
62 2 Arthur Kreymer
| Logging       | ifdh/numsg     |
63 5 Arthur Kreymer
64 1 Katherine Lato
h2. Access Methods to dCache for Interactive use
65 1 Katherine Lato
66 1 Katherine Lato
There are several access methods for interactive use of dCache files. These include: DCap, dccp, and srm, and gridftp.  Currently gridftp is the preferred method, and the default for our "ifdh cp" utility, which is the recommended tool for getting files in and out of dcache for experimenters.
67 1 Katherine Lato
68 1 Katherine Lato
h3. gridftp
69 1 Katherine Lato
70 1 Katherine Lato
Gridftp is the underlying file transfer mechansim used by SRM.  Using it directly reduces some copy connection overhead imposed by SRM.
71 1 Katherine Lato
72 1 Katherine Lato
The ifdh utility, in the ifdhc ups product, is the recommended tool for doing Gridftp copies for Fermilab experiments, and gridftp is currently the default transfer mechanism for copies in and out of dcache.
73 1 Katherine Lato
74 1 Katherine Lato
ifdh cp /pnfs/nova/scratch/users/mengel/test.txt /tmp/localfile.txt
75 1 Katherine Lato
76 1 Katherine Lato
One can also give full gsiftp: URI's for specifying grifdtp servers, for example:
77 1 Katherine Lato
78 10 Kenneth Herner
   @gsiftp://fndca1.fnal.gov/hypot/scratch@
79 10 Kenneth Herner
 
80 1 Katherine Lato
81 1 Katherine Lato
Note that our current dcache configuration hides the first 4 components of the /pnfs/fnal.gov/usr/<experiment-name>/... path when you do gridftp access,(assuming the Grid proxy you are using is mapped in the usual fashion).
82 1 Katherine Lato
83 1 Katherine Lato
h3. nfs v4.1 
84 1 Katherine Lato
85 1 Katherine Lato
On NFSV4.1 mounted filesystem you can do anything you normally do except modifying file content.
86 1 Katherine Lato
87 1 Katherine Lato
@mount  -v -t nfs4 -o minorversion=1 localhost:/pnfs /pnfs/fs@
88 1 Katherine Lato
89 1 Katherine Lato
Can then do commands like cp, rm, and so on.
90 1 Katherine Lato
91 1 Katherine Lato
For more information, please look at:
92 1 Katherine Lato
https://srm.fnal.gov/twiki/bin/view/DcacheCorner/DcacheFAQ
93 1 Katherine Lato
94 1 Katherine Lato
h3. Webdav 
95 1 Katherine Lato
96 1 Katherine Lato
Web Distributed Authoring and Versioning (WebDAV) is an extension of the Hypertext Transfer Protocol (HTTP) that allows users to create and modify web content. Many operating systems provide built-in client support for WebDAV. To browse namespace and download data, the user directs a web browser to https://fndca4a.fnal.gov:2880. (This is read only.)
97 1 Katherine Lato
98 1 Katherine Lato
To access the data, the user needs to generate grid certificate proxy like so:
99 1 Katherine Lato
$ grid-proxy-init
100 1 Katherine Lato
Your identity: /DC=org/DC=doegrids/OU=People/CN=Dmitry Litvintsev 257737
101 1 Katherine Lato
Enter GRID pass phrase for this identity:
102 1 Katherine Lato
Creating proxy .......................................... Done
103 1 Katherine Lato
Your proxy is valid until: Tue Feb 12 04:37:20 2013
104 1 Katherine Lato
105 1 Katherine Lato
Use the following curl command to put/get data using WebDAV door:
106 1 Katherine Lato
# example of put
107 1 Katherine Lato
$ curl -L --capath /etc/grid-security/certificates \
108 1 Katherine Lato
--cert /tmp/x509up_u8637 -T /etc/fstab
109 1 Katherine Lato
https://fndca4a.fnal.gov:2880/fermigrid/volatile/fermilab/litvinse/curl.txt
110 1 Katherine Lato
111 1 Katherine Lato
# example of get
112 1 Katherine Lato
$ curl -L --capath /etc/grid-security/certificates
113 1 Katherine Lato
--cert /tmp/x509up_u8637 \
114 1 Katherine Lato
https://fndca4a.fnal.gov:2880/fermigrid/volatile/fermilab/litvinse/curl.txt\
115 1 Katherine Lato
-o curl1.txt
116 1 Katherine Lato
% Total % Received
117 1 Katherine Lato
118 1 Katherine Lato
More information is available at:
119 1 Katherine Lato
http://cd-docdb.fnal.gov/cgi-bin/RetrieveFile?docid=5050;filename=webdav.pdf;version=2
120 1 Katherine Lato
121 1 Katherine Lato
h3. DCap
122 1 Katherine Lato
123 1 Katherine Lato
DCap provides POSIX-like open, create, read, write and lseek functions to the dCache storage. In addition there are some specific functions for setting debug level, getting error messages, and binding the library to a network interface. The dCap protocol requires specification of the dCache server host, port number, and domain, in addition to the inclusion of "/usr" ahead of the storage group designation in the PNFS path. Its structure is shown here:
124 1 Katherine Lato
125 1 Katherine Lato
dcap://<serverHost>:<port>/</pnfs>/<storage_group>/usr/<filePath> 
126 1 Katherine Lato
127 1 Katherine Lato
See http://www-dcache.desy.de/manuals/libdcap.html for usage information.
128 1 Katherine Lato
129 1 Katherine Lato
h3. dccp
130 1 Katherine Lato
131 1 Katherine Lato
The dccp command provides a cp-like functionality on the PNFS file system and has the following syntax:
132 1 Katherine Lato
133 1 Katherine Lato
% dccp [ options ] source_file [ destination_file ] 
134 1 Katherine Lato
 
135 7 Kenneth Herner
The options and command usage are described at http://www-dcache.desy.de/manuals/dccp.html. Note that on systems where PNFS is mounted via NFS 4.1, dccp will not work properly. In that case, just use cp or ifdh cp.
136 1 Katherine Lato
137 1 Katherine Lato
h3. srmcp
138 1 Katherine Lato
139 1 Katherine Lato
SRM is middleware for managing storage resources on a grid. The SRM implementation within the dCache manages the dCache/Enstore system. It provides functions for file staging and pinning2, transfer protocol negotiation and transfer url resolution.
140 1 Katherine Lato
141 1 Katherine Lato
The ifdh utility, in the ifdhc ups product, is the recommended tool for doing SRM copies for Fermilab experiments.  SRM is not currently the default
142 1 Katherine Lato
protocol for ifdh cp, so you need to specify it with a --force option to use it:
143 1 Katherine Lato
144 1 Katherine Lato
@ifdh cp --force=srm /pnfs/nova/scratch/users/mengel/test.txt /tmp/localfile.txt@
145 1 Katherine Lato
146 1 Katherine Lato
You can also give a full SRM protocol URI, used for the remote file specification, which  requires the SRM server host, port number, and domain. For the fnal.gov domain, the inclusion of "/usr" ahead of the storage group designation in the PNFS path is also required. Its structure is shown here:
147 1 Katherine Lato
148 1 Katherine Lato
@srm://<serverHost>:<portNumber>/service/path?SFN=/<root of fileSystem>/<storage_group>[/usr]/<filePath>@
149 1 Katherine Lato
 
150 1 Katherine Lato
The first two examples are for the fnal.gov domain, the third for cern.ch:
151 1 Katherine Lato
152 1 Katherine Lato
   @ srm://fndca1.fnal.gov:8443/srm/managerv2?SFN=/pnfs/fnal.gov/usr/nova/scratch@
153 1 Katherine Lato
   @ srm://cdfdca1.fnal.gov:8443/srm/managerv2?SFN=/pnfs/fnal.gov/usr/cdfen/filesets/<filePath>@
154 1 Katherine Lato
   @ srm://wacdr002d.cern.ch:9000/castor/cern.ch/user/<filePath> @
155 1 Katherine Lato
156 1 Katherine Lato
For details, please see:
157 1 Katherine Lato
http://www.fnal.gov/docs/products/enstore/enstore_may04/usingdcache.html#8346