Project

General

Profile

SAM web cookbook » History » Version 5

Adam Aurisano, 08/26/2015 11:19 AM

1 1 Gavin Davies
h1. SAM Web Cookbook (Nova Edition)
2 1 Gavin Davies
3 2 Gavin Davies
Right now the sam_web_client is setup by default when one does setup_nova. The experiment name  ("nova") is also set for you.
4 1 Gavin Davies
5 1 Gavin Davies
If you don't have a certificate, get one (based off your kerberos credentials):
6 1 Gavin Davies
<pre>
7 1 Gavin Davies
kx509
8 1 Gavin Davies
</pre>
9 1 Gavin Davies
10 2 Gavin Davies
You could automate this by putting it in your login file (i.e. .bash_profile or .bashrc)
11 1 Gavin Davies
12 1 Gavin Davies
{{TOC}}
13 1 Gavin Davies
14 1 Gavin Davies
h1. Glossary and Conventions
15 1 Gavin Davies
16 1 Gavin Davies
The following shorthands are used:
17 1 Gavin Davies
18 1 Gavin Davies
In the following, anything with a set of angled brackets denotes a variable.  i.e. <run number> would be insert your own personal run number you were interested in.
19 1 Gavin Davies
20 1 Gavin Davies
Anything with a dollar sign in front of it denotes a shell variable, i.e. $BASE_QUERY
21 1 Gavin Davies
22 1 Gavin Davies
* BASE_QUERY is the data tier and detector. It is assumed to be set like:
23 1 Gavin Davies
24 1 Gavin Davies
<pre>
25 1 Gavin Davies
export BASE_QUERY="data_tier raw AND online.detector fardet"
26 1 Gavin Davies
</pre>
27 1 Gavin Davies
 
28 1 Gavin Davies
To get help from samweb type:
29 1 Gavin Davies
<pre>
30 1 Gavin Davies
samweb --help-commands
31 1 Gavin Davies
</pre>
32 1 Gavin Davies
33 1 Gavin Davies
h1. Beginner Recipes (boiling water)
34 1 Gavin Davies
35 1 Gavin Davies
<pre>
36 1 Gavin Davies
export BASE_QUERY="data_tier raw AND online.detector fardet"
37 1 Gavin Davies
</pre>
38 1 Gavin Davies
To save some typing.
39 1 Gavin Davies
40 1 Gavin Davies
Queries are NOT case SeNsItIvE
41 1 Gavin Davies
42 1 Gavin Davies
To save typing some parts of earlier queries are denoted as $QUERY_XXX
43 1 Gavin Davies
44 1 Gavin Davies
Any where you see a "list-files" you can replace it with a "count-files" to just return a count instead of actual file names.
45 1 Gavin Davies
46 1 Gavin Davies
47 1 Gavin Davies
h2. List Files from a data tier and detector
48 1 Gavin Davies
49 1 Gavin Davies
50 1 Gavin Davies
*samweb list-files "data_tier <tier> AND online.detector <det>"
51 1 Gavin Davies
52 1 Gavin Davies
Example:
53 1 Gavin Davies
<pre>
54 1 Gavin Davies
samweb list-files "data_tier raw AND online.detector fardet"
55 1 Gavin Davies
</pre>
56 1 Gavin Davies
57 1 Gavin Davies
Will return 459,532 files (today)
58 1 Gavin Davies
59 1 Gavin Davies
<pre>
60 1 Gavin Davies
samweb list-files "data_tier raw AND online.detector ndos"
61 1 Gavin Davies
</pre>
62 1 Gavin Davies
Will return 91,360 files (today)
63 1 Gavin Davies
64 1 Gavin Davies
From here on we use $BASE_QUERY for this.
65 1 Gavin Davies
66 1 Gavin Davies
h2. List Files from a Run
67 1 Gavin Davies
68 1 Gavin Davies
*samweb list-files "$BASE_QUERY and online.runumber <runnumber>"*
69 1 Gavin Davies
70 1 Gavin Davies
Or:
71 1 Gavin Davies
72 1 Gavin Davies
*samweb list-files "$BASE_QUERY and run_number <runnumber>"*
73 1 Gavin Davies
74 1 Gavin Davies
First one is DAQ specific, the other is more general.
75 1 Gavin Davies
76 1 Gavin Davies
Example:
77 1 Gavin Davies
<pre>
78 1 Gavin Davies
samweb list-files "$BASE_QUERY and run_number 13114"
79 1 Gavin Davies
</pre>
80 1 Gavin Davies
81 1 Gavin Davies
h2. List Files from a Time Period
82 1 Gavin Davies
83 1 Gavin Davies
Files created between two times:
84 1 Gavin Davies
85 1 Gavin Davies
*samweb list-files "$BASE_QUERY and start_time > '2014-01-30T23:29:00' and start_time < '2014-01-31T00:30:00'"*
86 1 Gavin Davies
87 1 Gavin Davies
Example:
88 1 Gavin Davies
<pre>
89 1 Gavin Davies
samweb list-files "$BASE_QUERY and start_time > '2014-01-30T23:29:00' and start_time < '2014-01-31T00:30:00'"
90 1 Gavin Davies
</pre>
91 1 Gavin Davies
92 1 Gavin Davies
93 1 Gavin Davies
h2. List Files from a specific trigger stream
94 1 Gavin Davies
95 1 Gavin Davies
You want only a given stream.
96 1 Gavin Davies
97 1 Gavin Davies
*samweb list-files "$BASE_QUERY and run_number <run_no> and data_stream <stream>"*
98 1 Gavin Davies
99 1 Gavin Davies
or
100 1 Gavin Davies
101 1 Gavin Davies
*samweb list-files "$BASE_QUERY and run_number <run_no> and Online.Stream <stream>"*
102 1 Gavin Davies
103 1 Gavin Davies
For DAQ files only.
104 1 Gavin Davies
105 1 Gavin Davies
Stream is a number.  Streams are fully configurable, but in general in early 2014 they looked like:
106 1 Gavin Davies
107 1 Gavin Davies
* 0 = NuMI
108 1 Gavin Davies
* 1 = Booster Beam
109 1 Gavin Davies
* 2 = Min Bias
110 1 Gavin Davies
111 1 Gavin Davies
<pre>
112 1 Gavin Davies
samweb list-files "$BASE_QUERY and run_number 13114 and data_stream 0
113 1 Gavin Davies
</pre>
114 1 Gavin Davies
115 1 Gavin Davies
h2. List Files from DAQ Partition
116 1 Gavin Davies
117 1 Gavin Davies
You want only a specific DAQ Partition
118 1 Gavin Davies
119 1 Gavin Davies
*samweb list-file "$BASE_QUERY and Online.Partition <partno>"*
120 1 Gavin Davies
121 1 Gavin Davies
<pre>
122 1 Gavin Davies
*samweb list-file "$BASE_QUERY and Online.Partition 1"*
123 1 Gavin Davies
</pre>
124 1 Gavin Davies
125 1 Gavin Davies
h2. List Metadata associated with a file:
126 1 Gavin Davies
127 1 Gavin Davies
File names do not have paths, just base names (all files in SAM are unique)
128 1 Gavin Davies
129 1 Gavin Davies
*samweb get-metadata <filename>*
130 1 Gavin Davies
131 1 Gavin Davies
<pre>
132 1 Gavin Davies
samweb get-metadata fardet_r00013114_s20_t00.raw
133 1 Gavin Davies
</pre>
134 1 Gavin Davies
135 1 Gavin Davies
You get output like:
136 1 Gavin Davies
<pre>
137 1 Gavin Davies
                    File Name: fardet_r00013114_s20_t00.raw
138 1 Gavin Davies
                      File Id: 4877797
139 1 Gavin Davies
                    File Type: importedDetector
140 1 Gavin Davies
                  File Format: raw
141 1 Gavin Davies
                    File Size: 6908296
142 1 Gavin Davies
                          Crc: 74650857 (adler 32 crc type)
143 1 Gavin Davies
               Content Status: good
144 1 Gavin Davies
                        Group: nova
145 1 Gavin Davies
                    Data Tier: raw
146 1 Gavin Davies
                  Application: online datalogger 33
147 1 Gavin Davies
                  Event Count: 110
148 1 Gavin Davies
                  First Event: 171026
149 1 Gavin Davies
                   Last Event: 179507
150 1 Gavin Davies
                   Start Time: 2014-02-14T01:34:14
151 1 Gavin Davies
                     End Time: 2014-02-14T01:37:43
152 1 Gavin Davies
                  Data Stream: 0
153 1 Gavin Davies
             Online.ConfigIDX: 0
154 1 Gavin Davies
          Online.DataLoggerID: 1
155 1 Gavin Davies
     Online.DataLoggerVersion: 33
156 1 Gavin Davies
              Online.Detector: fardet
157 1 Gavin Davies
            Online.DetectorID: 2
158 1 Gavin Davies
             Online.Partition: 1
159 1 Gavin Davies
          Online.RunControlID: 0
160 1 Gavin Davies
     Online.RunControlVersion: 0
161 1 Gavin Davies
            Online.RunEndTime: 1392341863
162 1 Gavin Davies
             Online.RunNumber: 13114
163 1 Gavin Davies
               Online.RunSize: 1727074
164 1 Gavin Davies
          Online.RunStartTime: 1392337488
165 1 Gavin Davies
               Online.RunType: 0
166 1 Gavin Davies
                Online.Stream: 0
167 1 Gavin Davies
         Online.SubRunEndTime: 1392341863
168 1 Gavin Davies
       Online.SubRunStartTime: 1392341654
169 1 Gavin Davies
                Online.Subrun: 20
170 1 Gavin Davies
           Online.TotalEvents: 110
171 1 Gavin Davies
         Online.TriggerCtrlID: 0
172 1 Gavin Davies
        Online.TriggerListIDX: 0
173 1 Gavin Davies
Online.TriggerPrescaleListIDX: 0
174 1 Gavin Davies
        Online.TriggerVersion: 0
175 1 Gavin Davies
 Online.ValidTriggerTypesHigh: 0
176 1 Gavin Davies
Online.ValidTriggerTypesHigh2: 0
177 1 Gavin Davies
  Online.ValidTriggerTypesLow: 0
178 1 Gavin Davies
                         Runs: 13114.0020 (online)
179 1 Gavin Davies
               File Partition: 20
180 1 Gavin Davies
</pre>
181 1 Gavin Davies
182 1 Gavin Davies
h2. List files with some other parameter or parameters
183 1 Gavin Davies
184 1 Gavin Davies
*samweb list-file "$BASE_QUERY and Parameter.name_1 <value> and Parameter.name_2 <value>"*
185 1 Gavin Davies
186 1 Gavin Davies
<pre>
187 1 Gavin Davies
*samweb list-file "$BASE_QUERY and Online.TotalEvents > 123 and Online.DataLoggerVersion = 33"*
188 1 Gavin Davies
</pre>
189 1 Gavin Davies
190 1 Gavin Davies
191 1 Gavin Davies
h2. Get File locations
192 1 Gavin Davies
193 1 Gavin Davies
samweb locate-file <filename>
194 1 Gavin Davies
195 1 Gavin Davies
<pre>
196 1 Gavin Davies
samweb locate-file ndos_r00015701_s07_cosmic.raw
197 1 Gavin Davies
</pre>
198 1 Gavin Davies
Response will be a list of locations:
199 1 Gavin Davies
<pre>
200 1 Gavin Davies
novadata:/nova/data/rawdata/NDOS/000157/15701/cosmic
201 1 Gavin Davies
enstore:/pnfs/nova/rawdata/NDOS/runs/000157/15701(1548@vpe048)
202 1 Gavin Davies
</pre>
203 1 Gavin Davies
204 1 Gavin Davies
* Locations starting with "novadata" are bluearc central disk.
205 1 Gavin Davies
* Locations starting with "enstore" are dCache/Enstore locations (disk cache, tape backed)
206 1 Gavin Davies
207 1 Gavin Davies
h2. Get Ancestors of a File
208 1 Gavin Davies
209 1 Gavin Davies
*samweb file-lineage <children/descendants> <filename>*
210 1 Gavin Davies
211 1 Gavin Davies
Children are files derived directly from the input file
212 1 Gavin Davies
<pre>
213 1 Gavin Davies
samweb file-lineage children fardet_r00013096_s14_t00.raw
214 1 Gavin Davies
fardet_r00013096_s14_t00_numi_S14-01-20_v1_data.daq.root
215 1 Gavin Davies
</pre>
216 1 Gavin Davies
217 1 Gavin Davies
*samweb file-lineage <parents/ancenstors> <filename>*
218 1 Gavin Davies
219 1 Gavin Davies
<pre>
220 1 Gavin Davies
samweb file-lineage parents fardet_r00013096_s14_t00_numi_S14-01-20_v1_data.daq.root
221 1 Gavin Davies
fardet_r00013096_s14_t00.raw
222 1 Gavin Davies
</pre>
223 1 Gavin Davies
224 1 Gavin Davies
225 1 Gavin Davies
226 1 Gavin Davies
h1. Intermediate Recipes (Poaching eggs)
227 1 Gavin Davies
228 1 Gavin Davies
229 1 Gavin Davies
h2. Get a list of all currently defined fields
230 1 Gavin Davies
231 1 Gavin Davies
Go to:
232 1 Gavin Davies
"Current Nova Experiment Dimensions":http://samweb.fnal.gov:8480/sam/nova/api/files/list/dimensions
233 1 Gavin Davies
234 1 Gavin Davies
h2. Get a list of Non-DAQ data files (e.g. Laser Scans) matching a search
235 1 Gavin Davies
236 1 Gavin Davies
samweb list-file "data_tier laser_scan AND laser_scan.block_number = 23 AND laser_scan.layer_number > 4"
237 1 Gavin Davies
238 1 Gavin Davies
h2. Listing Files with children matching a selection
239 1 Gavin Davies
240 1 Gavin Davies
List raw files who have been processed through a different stage
241 1 Gavin Davies
242 1 Gavin Davies
samweb list-file "$BASE_QUERY and isparentof: (data_tier <stage> AND Parameter.name_1 <value>)"
243 1 Gavin Davies
244 1 Gavin Davies
<pre>
245 1 Gavin Davies
samweb list-files "$BASE_QUERY and isparentof: ( data_tier artdaq AND daq2rawdigit.base_release 'S14-01-20' )"
246 1 Gavin Davies
</pre>
247 1 Gavin Davies
248 1 Gavin Davies
h2. Listing Files that match a filename patern
249 1 Gavin Davies
250 1 Gavin Davies
This is to match parts of the file name
251 1 Gavin Davies
252 1 Gavin Davies
<pre>
253 1 Gavin Davies
samweb list-file "file_name like fardet%DDenergy%"
254 1 Gavin Davies
255 1 Gavin Davies
</pre>
256 1 Gavin Davies
257 1 Gavin Davies
h2. Listing Files with parents matching a selection
258 1 Gavin Davies
259 1 Gavin Davies
With BASE_QUERY2="data_tier artdaq AND online.detector fardet"
260 1 Gavin Davies
261 1 Gavin Davies
<pre>
262 1 Gavin Davies
samweb list-file "$BASE_QUERY2 and ischildof: ( data_tier raw AND Online.Subrun < 20)
263 1 Gavin Davies
</pre>
264 1 Gavin Davies
265 1 Gavin Davies
h2. Listing Files with no physical locations
266 1 Gavin Davies
267 1 Gavin Davies
*samweb list-files "$BASE_QUERY AND availability: virtual"*
268 1 Gavin Davies
269 1 Gavin Davies
<pre>
270 1 Gavin Davies
samweb list-files "$BASE_QUERY AND availability: virtual"
271 1 Gavin Davies
</pre>
272 1 Gavin Davies
273 1 Gavin Davies
274 1 Gavin Davies
h2. Listing Files with physical locations
275 1 Gavin Davies
276 1 Gavin Davies
*samweb list-files "$BASE_QUERY AND availability: physical"*
277 1 Gavin Davies
278 1 Gavin Davies
<pre>
279 1 Gavin Davies
samweb list-files "$BASE_QUERY AND availability: physical"
280 1 Gavin Davies
</pre>
281 1 Gavin Davies
282 1 Gavin Davies
h2. Retrieving Files with a physical location
283 1 Gavin Davies
284 1 Gavin Davies
You can retrieve files either individually or with a query pattern (multiple files).
285 1 Gavin Davies
286 1 Gavin Davies
h3. Retrieve a single file
287 1 Gavin Davies
288 1 Gavin Davies
*ifdh_fetch <filename>*
289 1 Gavin Davies
290 1 Gavin Davies
<pre>
291 1 Gavin Davies
ifdh_fetch fardet_r00012006_s61_t02.raw
292 1 Gavin Davies
</pre>
293 1 Gavin Davies
294 1 Gavin Davies
Note: you must have a valid certificate (i.e. run kx509)
295 1 Gavin Davies
296 1 Gavin Davies
h3. Retrieve a group of files
297 1 Gavin Davies
298 1 Gavin Davies
*ifdh_fetch `ifdh translateContraints <dimensions string>`*
299 1 Gavin Davies
300 1 Gavin Davies
<pre>
301 1 Gavin Davies
ifdh_fetch `ifdh translateConstraints "data_tier raw AND online.detector fardet and run_number 12006.51"`
302 1 Gavin Davies
</pre>
303 1 Gavin Davies
304 1 Gavin Davies
Note: Here ifdh is used to do the lookup of the files and then the resulting names are passed to the fetch.
305 1 Gavin Davies
306 1 Gavin Davies
h2. Verifying that your file was transfer correctly
307 1 Gavin Davies
308 1 Gavin Davies
Check the checksum against the tape copy (no json parser installed)
309 1 Gavin Davies
310 1 Gavin Davies
<pre>
311 1 Gavin Davies
# From Database
312 1 Gavin Davies
samweb get-metadata fardet_r00012006_s35_t02.raw | grep "Crc" | cut -d ':' -f 2 | cut -d ' ' -f 2
313 1 Gavin Davies
3828307205
314 1 Gavin Davies
# From file on disk
315 1 Gavin Davies
samweb file-checksum fardet_r00012006_s35_t02.raw | cut -d '"' -f 4
316 1 Gavin Davies
3828307205
317 1 Gavin Davies
</pre>
318 1 Gavin Davies
319 1 Gavin Davies
If you have a json parser available then just use that to parse the output instead of using "cut"
320 1 Gavin Davies
321 1 Gavin Davies
<pre>
322 1 Gavin Davies
samweb get-metadata fardet_r00012006_s35_t02.raw --json | jq '.crc.crc_value'
323 1 Gavin Davies
"3828307205"
324 1 Gavin Davies
</pre>
325 1 Gavin Davies
326 3 Adam Aurisano
h2. Finding projects run off of a dataset
327 3 Adam Aurisano
328 3 Adam Aurisano
If you need to determine who ran projects off a dataset, you can use:
329 3 Adam Aurisano
330 3 Adam Aurisano
<pre>
331 3 Adam Aurisano
samweb list-projects --defname=<defname>
332 3 Adam Aurisano
</pre>
333 3 Adam Aurisano
334 3 Adam Aurisano
This lists all projects run off a dataset.  Most project names start with the username of the person who created them, so generally, no further work is necessary.  If a project is listed whose creator is not obvious, you can use:
335 3 Adam Aurisano
336 3 Adam Aurisano
<pre>
337 3 Adam Aurisano
samweb project-summary <project name> | less
338 3 Adam Aurisano
</pre>
339 3 Adam Aurisano
340 3 Adam Aurisano
The first few lines of the output will tell you who the project creator was.
341 3 Adam Aurisano
342 1 Gavin Davies
h1. Advanced Recipes (Hollandaise sauce)
343 1 Gavin Davies
344 1 Gavin Davies
h2. Pre-stage a dataset
345 1 Gavin Davies
346 1 Gavin Davies
For large datasets that are definitely on tape (i.e. haven't been used for a long time) you will want to pre-stage the data to the dCache area before submitting jobs.
347 1 Gavin Davies
348 1 Gavin Davies
h2. Recovering a project
349 1 Gavin Davies
350 1 Gavin Davies
<pre>
351 1 Gavin Davies
samweb project-recovery -e nova --useFileStatus=0 --useProcessStatus=0 gsdavies-RecoFDGENIE_S14-03-25.sh-20140418_1037
352 1 Gavin Davies
</pre>
353 1 Gavin Davies
which yields:
354 1 Gavin Davies
355 1 Gavin Davies
(snapshot_id 15312 minus (project_name gsdavies-RecoFDGENIE_S14-03-25.sh-20140418_1037 and consumed_status consumed))
356 1 Gavin Davies
357 1 Gavin Davies
<pre>
358 1 Gavin Davies
kx509
359 1 Gavin Davies
samweb create-definition <new_definition_name> "(snapshot_id 15312 minus (project_name gsdavies-RecoFDGENIE_S14-03-25.sh-20140418_1037 and consumed_status consumed))"
360 1 Gavin Davies
</pre>
361 1 Gavin Davies
362 4 Andrew Norman
h2. Sampling/Prescaling a dataset
363 4 Andrew Norman
364 4 Andrew Norman
SAM provides a mechanism for deterministically sampling a dataset.  To do this:
365 4 Andrew Norman
366 4 Andrew Norman
* Define the dataset
367 4 Andrew Norman
* Define a new dataset with a stride and offset
368 4 Andrew Norman
369 4 Andrew Norman
<pre>
370 4 Andrew Norman
samweb create-definition my_dataset "<selection critera>"
371 4 Andrew Norman
samweb create-definition my_dataset_oneTenth_version1 "defname: my_dataset with stride 10 offset 0"
372 4 Andrew Norman
samweb create-definition my_dataset_oneTenth_version2 "defname: my_dataset with stride 10 offset 1"
373 4 Andrew Norman
...
374 4 Andrew Norman
</pre>
375 4 Andrew Norman
376 4 Andrew Norman
Each of these will create a dataset that is 1/10 the size of the original.  The offset parameter specifies where to start counting from (so offset 0 starts from the first element in the list, offset 1 starts from the second).
377 4 Andrew Norman
378 5 Adam Aurisano
h2. Adding additional constraints to a dataset
379 5 Adam Aurisano
380 5 Adam Aurisano
If a dataset already exists which contains all the files you want, but you want to add an additional constraint (for instance, you only want a specific run range) you can:
381 5 Adam Aurisano
382 5 Adam Aurisano
<pre>
383 5 Adam Aurisano
samweb create-definition my_dataset "defname: my_dataset and run_number >= startRun and run_number <= endRun"
384 5 Adam Aurisano
</pre>
385 5 Adam Aurisano
386 1 Gavin Davies
h2. Constructing a Good Runs List
387 1 Gavin Davies
388 4 Andrew Norman
h2. Pre-staging Data from Tape
389 4 Andrew Norman
390 4 Andrew Norman
If you need to prestage data from tape the way to do it is the following:
391 4 Andrew Norman
392 4 Andrew Norman
* Start a screen (or tmux) session
393 4 Andrew Norman
* Do a "run-project" with your dataset name
394 4 Andrew Norman
* Detach from the screen session
395 4 Andrew Norman
* Come back in a few hours or days...
396 4 Andrew Norman
397 4 Andrew Norman
<pre>
398 4 Andrew Norman
screen
399 4 Andrew Norman
samweb run-project --defname=fardet_onehertz_raw_Oct2014-May2015-10percent
400 4 Andrew Norman
</pre>
401 1 Gavin Davies
402 1 Gavin Davies
h1. Combining it all (Eggs Benedict)