FIFE Data Handling » History » Version 10
Kenneth Herner, 11/09/2020 02:33 PM
1 | 1 | Katherine Lato | h1. FIFE Data Handling |
---|---|---|---|
2 | 1 | Katherine Lato | |
3 | 1 | Katherine Lato | h2. Overview |
4 | 1 | Katherine Lato | |
5 | 1 | Katherine Lato | This is a synopsis of the full data handling documents |
6 | 1 | Katherine Lato | * "(2014) FIFE Data Architecture ":http://cd-docdb.fnal.gov/cgi-bin/ShowDocument?docid=5180 |
7 | 1 | Katherine Lato | * "(2011) Intensity Frontier Computing Model and GRID Tools":http://cd-docdb.fnal.gov/cgi-bin/ShowDocument?docid=4082 |
8 | 1 | Katherine Lato | |
9 | 1 | Katherine Lato | h2. Kinds of data access |
10 | 1 | Katherine Lato | |
11 | 1 | Katherine Lato | p. Computing jobs need to access various kinds of data, which we will attempt to outline here. |
12 | 1 | Katherine Lato | |
13 | 1 | Katherine Lato | # Executables/libraries -- jobs need to access the actual code which will execute |
14 | 1 | Katherine Lato | # Conditions data -- Calibration information, beam status information, etc. is generally kept in a database, and jobs need a way to access the data that will not overload the databases. |
15 | 1 | Katherine Lato | # Input Files -- should be transferred in a manner that doesn't pollute caches and be obtained from a SAM-like data handling system that provides data files in an order that can be retrieved efficiently |
16 | 1 | Katherine Lato | # Output files -- should be returned from the job, possibly to a location where they can be automatically registered in the data handling system |
17 | 1 | Katherine Lato | # Logging/Monitoring -- information about job status should be communicated back to a central location to assist with monitoring. |
18 | 1 | Katherine Lato | |
19 | 1 | Katherine Lato | h2. Storage resources |
20 | 1 | Katherine Lato | |
21 | 1 | Katherine Lato | This is an executive summary of data resources, with some common characteristics |
22 | 1 | Katherine Lato | |
23 | 6 | Arthur Kreymer | For illustration, we refer to a project named *<code>hypot</code>* |
24 | 6 | Arthur Kreymer | |
25 | 1 | Katherine Lato | | RESOURCE | Net capacity | Net data rate | File size | Access limits | Interfaces | Comments | |
26 | 6 | Arthur Kreymer | | Bluearc app | Few TB | .5 GB/sec | any | none for common files | NFS /hypot/app, /grid/fermiapp/hypot | For executables, libraries, small common files | |
27 | 9 | Joe Boyd | | Bluearc data | 240 TB per vol | .5 GB/sec | 1 MB block | 5 files at once per project | NFS /hypot/data /grid/data/hypot, FTP | For unmanaged project and cache | |
28 | 1 | Katherine Lato | | DCache | 3 PB | Multi GB/sec | 1 MB block | automatic, hundreds ? | NFS (SLF6+), dccp, webdav, FTP, xroot etc. | For managed files, non-scratch files backed to Enstore| |
29 | 1 | Katherine Lato | | Enstore | 10+ PB | Multi GB/sec | 2+ GB | access via DCache | DCache | | |
30 | 6 | Arthur Kreymer | |
31 | 6 | Arthur Kreymer | h3. DO |
32 | 6 | Arthur Kreymer | |
33 | 6 | Arthur Kreymer | Use ifdh cp or fetch to move data to and from local disk on worker nodes |
34 | 10 | Kenneth Herner | * remember files copied in (as opposed to streamed with xrootd) count against your local disk usage in the batch job. |
35 | 6 | Arthur Kreymer | * See Auxiliary File task force for advice on highly shared files |
36 | 6 | Arthur Kreymer | * ifdh also works on OSG |
37 | 6 | Arthur Kreymer | |
38 | 6 | Arthur Kreymer | Use Dcache for managed and high througput files |
39 | 6 | Arthur Kreymer | * archival - /pnfs/hypot/data |
40 | 6 | Arthur Kreymer | * scratch - /pnfs/hypot/data/scratch/users/... |
41 | 6 | Arthur Kreymer | * directly available to SLF6.4+ clients, with NFS 4.1 |
42 | 6 | Arthur Kreymer | |
43 | 6 | Arthur Kreymer | Use Bluearc for temporary user analysis files ( project disk ) |
44 | 6 | Arthur Kreymer | * /hypot/data |
45 | 6 | Arthur Kreymer | |
46 | 6 | Arthur Kreymer | h3. DO NOT |
47 | 6 | Arthur Kreymer | |
48 | 10 | Kenneth Herner | Write or read Bluearc disks such as /hypot/data or /hypot/app from grid jobs; they are not accessible. |
49 | 6 | Arthur Kreymer | |
50 | 9 | Joe Boyd | Try to edit or rewrite DCache files, it won't work |
51 | 1 | Katherine Lato | |
52 | 1 | Katherine Lato | h2. Interfaces |
53 | 1 | Katherine Lato | |
54 | 1 | Katherine Lato | Where possible, web interfaces which could take advantage of GRID squid caches, etc. should be used. |
55 | 1 | Katherine Lato | |
56 | 1 | Katherine Lato | | Data Type | Tool | |
57 | 1 | Katherine Lato | | Executables | CVMFS | |
58 | 1 | Katherine Lato | | Conditions | NuConDB | |
59 | 1 | Katherine Lato | | File metadata | samweb | |
60 | 1 | Katherine Lato | | Input | ifdh | |
61 | 1 | Katherine Lato | | Output | ifdh/FTS | |
62 | 2 | Arthur Kreymer | | Logging | ifdh/numsg | |
63 | 5 | Arthur Kreymer | |
64 | 1 | Katherine Lato | h2. Access Methods to dCache for Interactive use |
65 | 1 | Katherine Lato | |
66 | 1 | Katherine Lato | There are several access methods for interactive use of dCache files. These include: DCap, dccp, and srm, and gridftp. Currently gridftp is the preferred method, and the default for our "ifdh cp" utility, which is the recommended tool for getting files in and out of dcache for experimenters. |
67 | 1 | Katherine Lato | |
68 | 1 | Katherine Lato | h3. gridftp |
69 | 1 | Katherine Lato | |
70 | 1 | Katherine Lato | Gridftp is the underlying file transfer mechansim used by SRM. Using it directly reduces some copy connection overhead imposed by SRM. |
71 | 1 | Katherine Lato | |
72 | 1 | Katherine Lato | The ifdh utility, in the ifdhc ups product, is the recommended tool for doing Gridftp copies for Fermilab experiments, and gridftp is currently the default transfer mechanism for copies in and out of dcache. |
73 | 1 | Katherine Lato | |
74 | 1 | Katherine Lato | ifdh cp /pnfs/nova/scratch/users/mengel/test.txt /tmp/localfile.txt |
75 | 1 | Katherine Lato | |
76 | 1 | Katherine Lato | One can also give full gsiftp: URI's for specifying grifdtp servers, for example: |
77 | 1 | Katherine Lato | |
78 | 10 | Kenneth Herner | @gsiftp://fndca1.fnal.gov/hypot/scratch@ |
79 | 10 | Kenneth Herner | |
80 | 1 | Katherine Lato | |
81 | 1 | Katherine Lato | Note that our current dcache configuration hides the first 4 components of the /pnfs/fnal.gov/usr/<experiment-name>/... path when you do gridftp access,(assuming the Grid proxy you are using is mapped in the usual fashion). |
82 | 1 | Katherine Lato | |
83 | 1 | Katherine Lato | h3. nfs v4.1 |
84 | 1 | Katherine Lato | |
85 | 1 | Katherine Lato | On NFSV4.1 mounted filesystem you can do anything you normally do except modifying file content. |
86 | 1 | Katherine Lato | |
87 | 1 | Katherine Lato | @mount -v -t nfs4 -o minorversion=1 localhost:/pnfs /pnfs/fs@ |
88 | 1 | Katherine Lato | |
89 | 1 | Katherine Lato | Can then do commands like cp, rm, and so on. |
90 | 1 | Katherine Lato | |
91 | 1 | Katherine Lato | For more information, please look at: |
92 | 1 | Katherine Lato | https://srm.fnal.gov/twiki/bin/view/DcacheCorner/DcacheFAQ |
93 | 1 | Katherine Lato | |
94 | 1 | Katherine Lato | h3. Webdav |
95 | 1 | Katherine Lato | |
96 | 1 | Katherine Lato | Web Distributed Authoring and Versioning (WebDAV) is an extension of the Hypertext Transfer Protocol (HTTP) that allows users to create and modify web content. Many operating systems provide built-in client support for WebDAV. To browse namespace and download data, the user directs a web browser to https://fndca4a.fnal.gov:2880. (This is read only.) |
97 | 1 | Katherine Lato | |
98 | 1 | Katherine Lato | To access the data, the user needs to generate grid certificate proxy like so: |
99 | 1 | Katherine Lato | $ grid-proxy-init |
100 | 1 | Katherine Lato | Your identity: /DC=org/DC=doegrids/OU=People/CN=Dmitry Litvintsev 257737 |
101 | 1 | Katherine Lato | Enter GRID pass phrase for this identity: |
102 | 1 | Katherine Lato | Creating proxy .......................................... Done |
103 | 1 | Katherine Lato | Your proxy is valid until: Tue Feb 12 04:37:20 2013 |
104 | 1 | Katherine Lato | |
105 | 1 | Katherine Lato | Use the following curl command to put/get data using WebDAV door: |
106 | 1 | Katherine Lato | # example of put |
107 | 1 | Katherine Lato | $ curl -L --capath /etc/grid-security/certificates \ |
108 | 1 | Katherine Lato | --cert /tmp/x509up_u8637 -T /etc/fstab |
109 | 1 | Katherine Lato | https://fndca4a.fnal.gov:2880/fermigrid/volatile/fermilab/litvinse/curl.txt |
110 | 1 | Katherine Lato | |
111 | 1 | Katherine Lato | # example of get |
112 | 1 | Katherine Lato | $ curl -L --capath /etc/grid-security/certificates |
113 | 1 | Katherine Lato | --cert /tmp/x509up_u8637 \ |
114 | 1 | Katherine Lato | https://fndca4a.fnal.gov:2880/fermigrid/volatile/fermilab/litvinse/curl.txt\ |
115 | 1 | Katherine Lato | -o curl1.txt |
116 | 1 | Katherine Lato | % Total % Received |
117 | 1 | Katherine Lato | |
118 | 1 | Katherine Lato | More information is available at: |
119 | 1 | Katherine Lato | http://cd-docdb.fnal.gov/cgi-bin/RetrieveFile?docid=5050;filename=webdav.pdf;version=2 |
120 | 1 | Katherine Lato | |
121 | 1 | Katherine Lato | h3. DCap |
122 | 1 | Katherine Lato | |
123 | 1 | Katherine Lato | DCap provides POSIX-like open, create, read, write and lseek functions to the dCache storage. In addition there are some specific functions for setting debug level, getting error messages, and binding the library to a network interface. The dCap protocol requires specification of the dCache server host, port number, and domain, in addition to the inclusion of "/usr" ahead of the storage group designation in the PNFS path. Its structure is shown here: |
124 | 1 | Katherine Lato | |
125 | 1 | Katherine Lato | dcap://<serverHost>:<port>/</pnfs>/<storage_group>/usr/<filePath> |
126 | 1 | Katherine Lato | |
127 | 1 | Katherine Lato | See http://www-dcache.desy.de/manuals/libdcap.html for usage information. |
128 | 1 | Katherine Lato | |
129 | 1 | Katherine Lato | h3. dccp |
130 | 1 | Katherine Lato | |
131 | 1 | Katherine Lato | The dccp command provides a cp-like functionality on the PNFS file system and has the following syntax: |
132 | 1 | Katherine Lato | |
133 | 1 | Katherine Lato | % dccp [ options ] source_file [ destination_file ] |
134 | 1 | Katherine Lato | |
135 | 7 | Kenneth Herner | The options and command usage are described at http://www-dcache.desy.de/manuals/dccp.html. Note that on systems where PNFS is mounted via NFS 4.1, dccp will not work properly. In that case, just use cp or ifdh cp. |
136 | 1 | Katherine Lato | |
137 | 1 | Katherine Lato | h3. srmcp |
138 | 1 | Katherine Lato | |
139 | 1 | Katherine Lato | SRM is middleware for managing storage resources on a grid. The SRM implementation within the dCache manages the dCache/Enstore system. It provides functions for file staging and pinning2, transfer protocol negotiation and transfer url resolution. |
140 | 1 | Katherine Lato | |
141 | 1 | Katherine Lato | The ifdh utility, in the ifdhc ups product, is the recommended tool for doing SRM copies for Fermilab experiments. SRM is not currently the default |
142 | 1 | Katherine Lato | protocol for ifdh cp, so you need to specify it with a --force option to use it: |
143 | 1 | Katherine Lato | |
144 | 1 | Katherine Lato | @ifdh cp --force=srm /pnfs/nova/scratch/users/mengel/test.txt /tmp/localfile.txt@ |
145 | 1 | Katherine Lato | |
146 | 1 | Katherine Lato | You can also give a full SRM protocol URI, used for the remote file specification, which requires the SRM server host, port number, and domain. For the fnal.gov domain, the inclusion of "/usr" ahead of the storage group designation in the PNFS path is also required. Its structure is shown here: |
147 | 1 | Katherine Lato | |
148 | 1 | Katherine Lato | @srm://<serverHost>:<portNumber>/service/path?SFN=/<root of fileSystem>/<storage_group>[/usr]/<filePath>@ |
149 | 1 | Katherine Lato | |
150 | 1 | Katherine Lato | The first two examples are for the fnal.gov domain, the third for cern.ch: |
151 | 1 | Katherine Lato | |
152 | 1 | Katherine Lato | @ srm://fndca1.fnal.gov:8443/srm/managerv2?SFN=/pnfs/fnal.gov/usr/nova/scratch@ |
153 | 1 | Katherine Lato | @ srm://cdfdca1.fnal.gov:8443/srm/managerv2?SFN=/pnfs/fnal.gov/usr/cdfen/filesets/<filePath>@ |
154 | 1 | Katherine Lato | @ srm://wacdr002d.cern.ch:9000/castor/cern.ch/user/<filePath> @ |
155 | 1 | Katherine Lato | |
156 | 1 | Katherine Lato | For details, please see: |
157 | 1 | Katherine Lato | http://www.fnal.gov/docs/products/enstore/enstore_may04/usingdcache.html#8346 |