Project

General

Profile

FIFE Data Handling » History » Version 4

Arthur Kreymer, 10/03/2014 11:48 AM
Dismount to Unmount

1 1 Katherine Lato
h1. FIFE Data Handling
2 1 Katherine Lato
3 1 Katherine Lato
h2. Overview
4 1 Katherine Lato
5 1 Katherine Lato
This is a synopsis of the full data handling documents
6 1 Katherine Lato
* "(2014) FIFE Data Architecture ":http://cd-docdb.fnal.gov/cgi-bin/ShowDocument?docid=5180
7 1 Katherine Lato
* "(2011) Intensity Frontier Computing Model and GRID Tools":http://cd-docdb.fnal.gov/cgi-bin/ShowDocument?docid=4082 
8 1 Katherine Lato
9 1 Katherine Lato
h2. Kinds of data access
10 1 Katherine Lato
11 1 Katherine Lato
p. Computing jobs need to access various kinds of data, which we will attempt to outline here.
12 1 Katherine Lato
13 1 Katherine Lato
# Executables/libraries -- jobs need to access the actual code which will execute
14 1 Katherine Lato
# Conditions data -- Calibration information, beam status information, etc. is generally kept in a database, and jobs need a way to access the data that will not overload the databases.
15 1 Katherine Lato
# Input Files -- should be transferred in a manner that doesn't pollute caches and be obtained from a SAM-like data handling system that provides data files in an order that can be retrieved efficiently
16 1 Katherine Lato
# Output files -- should be returned from the job, possibly to a location where they can be automatically registered in the data handling system
17 1 Katherine Lato
# Logging/Monitoring -- information about job status should be communicated back to a central location to assist with monitoring.
18 1 Katherine Lato
19 1 Katherine Lato
h2. Storage resources
20 1 Katherine Lato
21 1 Katherine Lato
This is an executive summary of data resources, with some common characteristics
22 1 Katherine Lato
23 1 Katherine Lato
| RESOURCE | Net capacity | Net data rate | File size | Access limits | Interfaces | Comments |
24 1 Katherine Lato
| Bluearc app | Few TB | .5 GB/sec | any | none for common files | NFS /<proj>/app, /grid/fermiapp/<proj> | For executables, libraries, small common files |
25 2 Arthur Kreymer
| Bluearc data | 240 TB per vol | .5 GB/sec | 1 MB block | 5 files at once per project | NFS /*/data /grid/data/*, FTP | For unmanaged project and cache, use ifdh cp on grid |
26 1 Katherine Lato
| DCache | 3 PB | Multi GB/sec | 1 MB block | automatic, hundreds ? | NFS (SLF6+), dccp, webdav, FTP, xroot etc. | For managed files, non-scratch files backed to Enstore|
27 1 Katherine Lato
| Enstore | 10+ PB | Multi GB/sec | 2+ GB | access via DCache | DCache | |
28 1 Katherine Lato
29 1 Katherine Lato
30 1 Katherine Lato
h2. Interfaces
31 1 Katherine Lato
32 1 Katherine Lato
Where possible, web interfaces which could take advantage of GRID squid caches, etc. should be used.
33 1 Katherine Lato
34 1 Katherine Lato
| Data Type     | Tool           |
35 1 Katherine Lato
| Executables   | CVMFS          | 
36 1 Katherine Lato
| Conditions    | NuConDB        |
37 1 Katherine Lato
| File metadata | samweb         |
38 1 Katherine Lato
| Input         | ifdh           |
39 1 Katherine Lato
| Output        | ifdh/FTS       |
40 1 Katherine Lato
| Logging       | ifdh/numsg     |
41 2 Arthur Kreymer
42 3 Arthur Kreymer
h2. Fermigrid Bluearc Unmount Task Force
43 2 Arthur Kreymer
44 2 Arthur Kreymer
There have been ongoing issues with Bluearc overloads 
45 2 Arthur Kreymer
due to accidental direct access to Bluearc file systems from Fermigrid jobs.
46 4 Arthur Kreymer
There is a short term Sep/Oct 2014 [[FermiGridBlue|Fermigrid Bluearc Unmount Task Force]] 
47 2 Arthur Kreymer
preparing plans for eliminating these overloads.
48 1 Katherine Lato
49 1 Katherine Lato
h2. Access Methods to dCache for Interactive use
50 1 Katherine Lato
51 1 Katherine Lato
There are several access methods for interactive use of dCache files. These include: DCap, dccp, and srm, and gridftp.  Currently gridftp is the preferred method, and the default for our "ifdh cp" utility, which is the recommended tool for getting files in and out of dcache for experimenters.
52 1 Katherine Lato
53 1 Katherine Lato
h3. gridftp
54 1 Katherine Lato
55 1 Katherine Lato
Gridftp is the underlying file transfer mechansim used by SRM.  Using it directly reduces some copy connection overhead imposed by SRM.
56 1 Katherine Lato
57 1 Katherine Lato
The ifdh utility, in the ifdhc ups product, is the recommended tool for doing Gridftp copies for Fermilab experiments, and gridftp is currently the default transfer mechanism for copies in and out of dcache.
58 1 Katherine Lato
59 1 Katherine Lato
ifdh cp /pnfs/nova/scratch/users/mengel/test.txt /tmp/localfile.txt
60 1 Katherine Lato
61 1 Katherine Lato
One can also give full gsiftp: URI's for specifying grifdtp servers, for example:
62 1 Katherine Lato
63 1 Katherine Lato
   @gsiftp://fndca1.fnal.gov/scratch@
64 1 Katherine Lato
   @gsiftp://fg-besman1.fnal.gov/grid/data@
65 1 Katherine Lato
66 1 Katherine Lato
Note that our current dcache configuration hides the first 4 components of the /pnfs/fnal.gov/usr/<experiment-name>/... path when you do gridftp access,(assuming the Grid proxy you are using is mapped in the usual fashion).
67 1 Katherine Lato
68 1 Katherine Lato
h3. nfs v4.1 
69 1 Katherine Lato
70 1 Katherine Lato
On NFSV4.1 mounted filesystem you can do anything you normally do except modifying file content.
71 1 Katherine Lato
72 1 Katherine Lato
@mount  -v -t nfs4 -o minorversion=1 localhost:/pnfs /pnfs/fs@
73 1 Katherine Lato
74 1 Katherine Lato
Can then do commands like cp, rm, and so on.
75 1 Katherine Lato
76 1 Katherine Lato
For more information, please look at:
77 1 Katherine Lato
https://srm.fnal.gov/twiki/bin/view/DcacheCorner/DcacheFAQ
78 1 Katherine Lato
79 1 Katherine Lato
h3. Webdav 
80 1 Katherine Lato
81 1 Katherine Lato
Web Distributed Authoring and Versioning (WebDAV) is an extension of the Hypertext Transfer Protocol (HTTP) that allows users to create and modify web content. Many operating systems provide built-in client support for WebDAV. To browse namespace and download data, the user directs a web browser to https://fndca4a.fnal.gov:2880. (This is read only.)
82 1 Katherine Lato
83 1 Katherine Lato
To access the data, the user needs to generate grid certificate proxy like so:
84 1 Katherine Lato
$ grid-proxy-init
85 1 Katherine Lato
Your identity: /DC=org/DC=doegrids/OU=People/CN=Dmitry Litvintsev 257737
86 1 Katherine Lato
Enter GRID pass phrase for this identity:
87 1 Katherine Lato
Creating proxy .......................................... Done
88 1 Katherine Lato
Your proxy is valid until: Tue Feb 12 04:37:20 2013
89 1 Katherine Lato
90 1 Katherine Lato
Use the following curl command to put/get data using WebDAV door:
91 1 Katherine Lato
# example of put
92 1 Katherine Lato
$ curl -L --capath /etc/grid-security/certificates \
93 1 Katherine Lato
--cert /tmp/x509up_u8637 -T /etc/fstab
94 1 Katherine Lato
https://fndca4a.fnal.gov:2880/fermigrid/volatile/fermilab/litvinse/curl.txt
95 1 Katherine Lato
96 1 Katherine Lato
# example of get
97 1 Katherine Lato
$ curl -L --capath /etc/grid-security/certificates
98 1 Katherine Lato
--cert /tmp/x509up_u8637 \
99 1 Katherine Lato
https://fndca4a.fnal.gov:2880/fermigrid/volatile/fermilab/litvinse/curl.txt\
100 1 Katherine Lato
-o curl1.txt
101 1 Katherine Lato
% Total % Received
102 1 Katherine Lato
103 1 Katherine Lato
More information is available at:
104 1 Katherine Lato
http://cd-docdb.fnal.gov/cgi-bin/RetrieveFile?docid=5050;filename=webdav.pdf;version=2
105 1 Katherine Lato
106 1 Katherine Lato
h3. DCap
107 1 Katherine Lato
108 1 Katherine Lato
DCap provides POSIX-like open, create, read, write and lseek functions to the dCache storage. In addition there are some specific functions for setting debug level, getting error messages, and binding the library to a network interface. The dCap protocol requires specification of the dCache server host, port number, and domain, in addition to the inclusion of "/usr" ahead of the storage group designation in the PNFS path. Its structure is shown here:
109 1 Katherine Lato
110 1 Katherine Lato
dcap://<serverHost>:<port>/</pnfs>/<storage_group>/usr/<filePath> 
111 1 Katherine Lato
112 1 Katherine Lato
See http://www-dcache.desy.de/manuals/libdcap.html for usage information.
113 1 Katherine Lato
114 1 Katherine Lato
h3. dccp
115 1 Katherine Lato
116 1 Katherine Lato
The dccp command provides a cp-like functionality on the PNFS file system and has the following syntax:
117 1 Katherine Lato
118 1 Katherine Lato
% dccp [ options ] source_file [ destination_file ] 
119 1 Katherine Lato
 
120 1 Katherine Lato
The options and command usage are described at http://www-dcache.desy.de/manuals/dccp.html.
121 1 Katherine Lato
122 1 Katherine Lato
h3. srmcp
123 1 Katherine Lato
124 1 Katherine Lato
SRM is middleware for managing storage resources on a grid. The SRM implementation within the dCache manages the dCache/Enstore system. It provides functions for file staging and pinning2, transfer protocol negotiation and transfer url resolution.
125 1 Katherine Lato
126 1 Katherine Lato
The ifdh utility, in the ifdhc ups product, is the recommended tool for doing SRM copies for Fermilab experiments.  SRM is not currently the default
127 1 Katherine Lato
protocol for ifdh cp, so you need to specify it with a --force option to use it:
128 1 Katherine Lato
129 1 Katherine Lato
@ifdh cp --force=srm /pnfs/nova/scratch/users/mengel/test.txt /tmp/localfile.txt@
130 1 Katherine Lato
131 1 Katherine Lato
You can also give a full SRM protocol URI, used for the remote file specification, which  requires the SRM server host, port number, and domain. For the fnal.gov domain, the inclusion of "/usr" ahead of the storage group designation in the PNFS path is also required. Its structure is shown here:
132 1 Katherine Lato
133 1 Katherine Lato
@srm://<serverHost>:<portNumber>/service/path?SFN=/<root of fileSystem>/<storage_group>[/usr]/<filePath>@
134 1 Katherine Lato
 
135 1 Katherine Lato
The first two examples are for the fnal.gov domain, the third for cern.ch:
136 1 Katherine Lato
137 1 Katherine Lato
   @ srm://fndca1.fnal.gov:8443/srm/managerv2?SFN=/pnfs/fnal.gov/usr/nova/scratch@
138 1 Katherine Lato
   @ srm://cdfdca1.fnal.gov:8443/srm/managerv2?SFN=/pnfs/fnal.gov/usr/cdfen/filesets/<filePath>@
139 1 Katherine Lato
   @ srm://wacdr002d.cern.ch:9000/castor/cern.ch/user/<filePath> @
140 1 Katherine Lato
141 1 Katherine Lato
For details, please see:
142 1 Katherine Lato
http://www.fnal.gov/docs/products/enstore/enstore_may04/usingdcache.html#8346