Project

General

Profile

SAM web cookbook » History » Version 14

Anna Heggestuen, 02/11/2019 09:14 PM

1 12 Erica Smith
h1. SAM Web Cookbook (NOvA Edition)
2 1 Gavin Davies
3 2 Gavin Davies
Right now the sam_web_client is setup by default when one does setup_nova. The experiment name  ("nova") is also set for you.
4 1 Gavin Davies
5 1 Gavin Davies
If you don't have a certificate, get one (based off your kerberos credentials):
6 1 Gavin Davies
<pre>
7 1 Gavin Davies
kx509
8 1 Gavin Davies
</pre>
9 1 Gavin Davies
10 2 Gavin Davies
You could automate this by putting it in your login file (i.e. .bash_profile or .bashrc)
11 1 Gavin Davies
12 1 Gavin Davies
{{TOC}}
13 1 Gavin Davies
14 1 Gavin Davies
h1. Glossary and Conventions
15 1 Gavin Davies
16 1 Gavin Davies
The following shorthands are used:
17 1 Gavin Davies
18 1 Gavin Davies
In the following, anything with a set of angled brackets denotes a variable.  i.e. <run number> would be insert your own personal run number you were interested in.
19 1 Gavin Davies
20 1 Gavin Davies
Anything with a dollar sign in front of it denotes a shell variable, i.e. $BASE_QUERY
21 1 Gavin Davies
22 1 Gavin Davies
* BASE_QUERY is the data tier and detector. It is assumed to be set like:
23 1 Gavin Davies
24 1 Gavin Davies
<pre>
25 1 Gavin Davies
export BASE_QUERY="data_tier raw AND online.detector fardet"
26 1 Gavin Davies
</pre>
27 1 Gavin Davies
 
28 1 Gavin Davies
To get help from samweb type:
29 1 Gavin Davies
<pre>
30 1 Gavin Davies
samweb --help-commands
31 1 Gavin Davies
</pre>
32 1 Gavin Davies
33 1 Gavin Davies
h1. Beginner Recipes (boiling water)
34 1 Gavin Davies
35 1 Gavin Davies
<pre>
36 1 Gavin Davies
export BASE_QUERY="data_tier raw AND online.detector fardet"
37 1 Gavin Davies
</pre>
38 1 Gavin Davies
To save some typing.
39 1 Gavin Davies
40 1 Gavin Davies
Queries are NOT case SeNsItIvE
41 1 Gavin Davies
42 1 Gavin Davies
To save typing some parts of earlier queries are denoted as $QUERY_XXX
43 1 Gavin Davies
44 1 Gavin Davies
Any where you see a "list-files" you can replace it with a "count-files" to just return a count instead of actual file names.
45 1 Gavin Davies
46 1 Gavin Davies
47 1 Gavin Davies
h2. List Files from a data tier and detector
48 1 Gavin Davies
49 1 Gavin Davies
50 1 Gavin Davies
*samweb list-files "data_tier <tier> AND online.detector <det>"
51 1 Gavin Davies
52 1 Gavin Davies
Example:
53 1 Gavin Davies
<pre>
54 1 Gavin Davies
samweb list-files "data_tier raw AND online.detector fardet"
55 1 Gavin Davies
</pre>
56 1 Gavin Davies
57 1 Gavin Davies
Will return 459,532 files (today)
58 1 Gavin Davies
59 1 Gavin Davies
<pre>
60 1 Gavin Davies
samweb list-files "data_tier raw AND online.detector ndos"
61 1 Gavin Davies
</pre>
62 1 Gavin Davies
Will return 91,360 files (today)
63 1 Gavin Davies
64 1 Gavin Davies
From here on we use $BASE_QUERY for this.
65 1 Gavin Davies
66 1 Gavin Davies
h2. List Files from a Run
67 1 Gavin Davies
68 1 Gavin Davies
*samweb list-files "$BASE_QUERY and online.runumber <runnumber>"*
69 1 Gavin Davies
70 1 Gavin Davies
Or:
71 1 Gavin Davies
72 1 Gavin Davies
*samweb list-files "$BASE_QUERY and run_number <runnumber>"*
73 1 Gavin Davies
74 1 Gavin Davies
First one is DAQ specific, the other is more general.
75 1 Gavin Davies
76 1 Gavin Davies
Example:
77 1 Gavin Davies
<pre>
78 1 Gavin Davies
samweb list-files "$BASE_QUERY and run_number 13114"
79 1 Gavin Davies
</pre>
80 1 Gavin Davies
81 1 Gavin Davies
h2. List Files from a Time Period
82 1 Gavin Davies
83 1 Gavin Davies
Files created between two times:
84 1 Gavin Davies
85 1 Gavin Davies
*samweb list-files "$BASE_QUERY and start_time > '2014-01-30T23:29:00' and start_time < '2014-01-31T00:30:00'"*
86 1 Gavin Davies
87 1 Gavin Davies
Example:
88 1 Gavin Davies
<pre>
89 1 Gavin Davies
samweb list-files "$BASE_QUERY and start_time > '2014-01-30T23:29:00' and start_time < '2014-01-31T00:30:00'"
90 1 Gavin Davies
</pre>
91 1 Gavin Davies
92 1 Gavin Davies
h2. List Files from a specific trigger stream
93 1 Gavin Davies
94 1 Gavin Davies
You want only a given stream.
95 1 Gavin Davies
96 1 Gavin Davies
*samweb list-files "$BASE_QUERY and run_number <run_no> and data_stream <stream>"*
97 1 Gavin Davies
98 1 Gavin Davies
or
99 1 Gavin Davies
100 1 Gavin Davies
*samweb list-files "$BASE_QUERY and run_number <run_no> and Online.Stream <stream>"*
101 1 Gavin Davies
102 1 Gavin Davies
For DAQ files only.
103 1 Gavin Davies
104 1 Gavin Davies
Stream is a number.  Streams are fully configurable, but in general in early 2014 they looked like:
105 1 Gavin Davies
106 1 Gavin Davies
* 0 = NuMI
107 1 Gavin Davies
* 1 = Booster Beam
108 1 Gavin Davies
* 2 = Min Bias
109 14 Anna Heggestuen
110 14 Anna Heggestuen
As of Feb 2019, the streams are:
111 14 Anna Heggestuen
* There is no stream name for files with a global trigger only -- everything from that run is written into this file.
112 14 Anna Heggestuen
* 0 = NuMI trigger.
113 14 Anna Heggestuen
* 1 = Booster trigger.
114 14 Anna Heggestuen
* 2 = cosmic trigger.
115 14 Anna Heggestuen
* 4 = calibration mode.
116 1 Gavin Davies
117 1 Gavin Davies
<pre>
118 1 Gavin Davies
samweb list-files "$BASE_QUERY and run_number 13114 and data_stream 0
119 1 Gavin Davies
</pre>
120 1 Gavin Davies
121 1 Gavin Davies
h2. List Files from DAQ Partition
122 1 Gavin Davies
123 1 Gavin Davies
You want only a specific DAQ Partition
124 1 Gavin Davies
125 1 Gavin Davies
*samweb list-file "$BASE_QUERY and Online.Partition <partno>"*
126 1 Gavin Davies
127 1 Gavin Davies
<pre>
128 1 Gavin Davies
*samweb list-file "$BASE_QUERY and Online.Partition 1"*
129 1 Gavin Davies
</pre>
130 1 Gavin Davies
131 1 Gavin Davies
h2. List Metadata associated with a file:
132 1 Gavin Davies
133 1 Gavin Davies
File names do not have paths, just base names (all files in SAM are unique)
134 1 Gavin Davies
135 1 Gavin Davies
*samweb get-metadata <filename>*
136 1 Gavin Davies
137 1 Gavin Davies
<pre>
138 1 Gavin Davies
samweb get-metadata fardet_r00013114_s20_t00.raw
139 1 Gavin Davies
</pre>
140 1 Gavin Davies
141 1 Gavin Davies
You get output like:
142 1 Gavin Davies
<pre>
143 1 Gavin Davies
                    File Name: fardet_r00013114_s20_t00.raw
144 1 Gavin Davies
                      File Id: 4877797
145 1 Gavin Davies
                    File Type: importedDetector
146 1 Gavin Davies
                  File Format: raw
147 1 Gavin Davies
                    File Size: 6908296
148 1 Gavin Davies
                          Crc: 74650857 (adler 32 crc type)
149 1 Gavin Davies
               Content Status: good
150 1 Gavin Davies
                        Group: nova
151 1 Gavin Davies
                    Data Tier: raw
152 1 Gavin Davies
                  Application: online datalogger 33
153 1 Gavin Davies
                  Event Count: 110
154 1 Gavin Davies
                  First Event: 171026
155 1 Gavin Davies
                   Last Event: 179507
156 1 Gavin Davies
                   Start Time: 2014-02-14T01:34:14
157 1 Gavin Davies
                     End Time: 2014-02-14T01:37:43
158 1 Gavin Davies
                  Data Stream: 0
159 1 Gavin Davies
             Online.ConfigIDX: 0
160 1 Gavin Davies
          Online.DataLoggerID: 1
161 1 Gavin Davies
     Online.DataLoggerVersion: 33
162 1 Gavin Davies
              Online.Detector: fardet
163 1 Gavin Davies
            Online.DetectorID: 2
164 1 Gavin Davies
             Online.Partition: 1
165 1 Gavin Davies
          Online.RunControlID: 0
166 1 Gavin Davies
     Online.RunControlVersion: 0
167 1 Gavin Davies
            Online.RunEndTime: 1392341863
168 1 Gavin Davies
             Online.RunNumber: 13114
169 1 Gavin Davies
               Online.RunSize: 1727074
170 1 Gavin Davies
          Online.RunStartTime: 1392337488
171 1 Gavin Davies
               Online.RunType: 0
172 1 Gavin Davies
                Online.Stream: 0
173 1 Gavin Davies
         Online.SubRunEndTime: 1392341863
174 1 Gavin Davies
       Online.SubRunStartTime: 1392341654
175 1 Gavin Davies
                Online.Subrun: 20
176 1 Gavin Davies
           Online.TotalEvents: 110
177 1 Gavin Davies
         Online.TriggerCtrlID: 0
178 1 Gavin Davies
        Online.TriggerListIDX: 0
179 1 Gavin Davies
Online.TriggerPrescaleListIDX: 0
180 1 Gavin Davies
        Online.TriggerVersion: 0
181 1 Gavin Davies
 Online.ValidTriggerTypesHigh: 0
182 1 Gavin Davies
Online.ValidTriggerTypesHigh2: 0
183 1 Gavin Davies
  Online.ValidTriggerTypesLow: 0
184 1 Gavin Davies
                         Runs: 13114.0020 (online)
185 1 Gavin Davies
               File Partition: 20
186 1 Gavin Davies
</pre>
187 1 Gavin Davies
188 1 Gavin Davies
h2. List files with some other parameter or parameters
189 1 Gavin Davies
190 1 Gavin Davies
*samweb list-file "$BASE_QUERY and Parameter.name_1 <value> and Parameter.name_2 <value>"*
191 1 Gavin Davies
192 1 Gavin Davies
<pre>
193 1 Gavin Davies
*samweb list-file "$BASE_QUERY and Online.TotalEvents > 123 and Online.DataLoggerVersion = 33"*
194 1 Gavin Davies
</pre>
195 1 Gavin Davies
196 1 Gavin Davies
197 1 Gavin Davies
h2. Get File locations
198 1 Gavin Davies
199 1 Gavin Davies
samweb locate-file <filename>
200 1 Gavin Davies
201 1 Gavin Davies
<pre>
202 1 Gavin Davies
samweb locate-file ndos_r00015701_s07_cosmic.raw
203 1 Gavin Davies
</pre>
204 1 Gavin Davies
Response will be a list of locations:
205 1 Gavin Davies
<pre>
206 1 Gavin Davies
novadata:/nova/data/rawdata/NDOS/000157/15701/cosmic
207 1 Gavin Davies
enstore:/pnfs/nova/rawdata/NDOS/runs/000157/15701(1548@vpe048)
208 1 Gavin Davies
</pre>
209 1 Gavin Davies
210 1 Gavin Davies
* Locations starting with "novadata" are bluearc central disk.
211 1 Gavin Davies
* Locations starting with "enstore" are dCache/Enstore locations (disk cache, tape backed)
212 1 Gavin Davies
213 1 Gavin Davies
h2. Get Ancestors of a File
214 1 Gavin Davies
215 1 Gavin Davies
*samweb file-lineage <children/descendants> <filename>*
216 1 Gavin Davies
217 1 Gavin Davies
Children are files derived directly from the input file
218 1 Gavin Davies
<pre>
219 1 Gavin Davies
samweb file-lineage children fardet_r00013096_s14_t00.raw
220 1 Gavin Davies
fardet_r00013096_s14_t00_numi_S14-01-20_v1_data.daq.root
221 1 Gavin Davies
</pre>
222 1 Gavin Davies
223 1 Gavin Davies
*samweb file-lineage <parents/ancenstors> <filename>*
224 1 Gavin Davies
225 1 Gavin Davies
<pre>
226 1 Gavin Davies
samweb file-lineage parents fardet_r00013096_s14_t00_numi_S14-01-20_v1_data.daq.root
227 1 Gavin Davies
fardet_r00013096_s14_t00.raw
228 1 Gavin Davies
</pre>
229 1 Gavin Davies
230 1 Gavin Davies
231 1 Gavin Davies
232 1 Gavin Davies
h1. Intermediate Recipes (Poaching eggs)
233 1 Gavin Davies
234 1 Gavin Davies
235 1 Gavin Davies
h2. Get a list of all currently defined fields
236 1 Gavin Davies
237 1 Gavin Davies
Go to:
238 1 Gavin Davies
"Current Nova Experiment Dimensions":http://samweb.fnal.gov:8480/sam/nova/api/files/list/dimensions
239 1 Gavin Davies
240 1 Gavin Davies
h2. Get a list of Non-DAQ data files (e.g. Laser Scans) matching a search
241 1 Gavin Davies
242 1 Gavin Davies
samweb list-file "data_tier laser_scan AND laser_scan.block_number = 23 AND laser_scan.layer_number > 4"
243 1 Gavin Davies
244 1 Gavin Davies
h2. Listing Files with children matching a selection
245 1 Gavin Davies
246 1 Gavin Davies
List raw files who have been processed through a different stage
247 1 Gavin Davies
248 1 Gavin Davies
samweb list-file "$BASE_QUERY and isparentof: (data_tier <stage> AND Parameter.name_1 <value>)"
249 1 Gavin Davies
250 1 Gavin Davies
<pre>
251 1 Gavin Davies
samweb list-files "$BASE_QUERY and isparentof: ( data_tier artdaq AND daq2rawdigit.base_release 'S14-01-20' )"
252 1 Gavin Davies
</pre>
253 1 Gavin Davies
254 1 Gavin Davies
h2. Listing Files that match a filename patern
255 1 Gavin Davies
256 1 Gavin Davies
This is to match parts of the file name
257 1 Gavin Davies
258 1 Gavin Davies
<pre>
259 1 Gavin Davies
samweb list-file "file_name like fardet%DDenergy%"
260 1 Gavin Davies
261 1 Gavin Davies
</pre>
262 1 Gavin Davies
263 1 Gavin Davies
h2. Listing Files with parents matching a selection
264 1 Gavin Davies
265 1 Gavin Davies
With BASE_QUERY2="data_tier artdaq AND online.detector fardet"
266 1 Gavin Davies
267 1 Gavin Davies
<pre>
268 1 Gavin Davies
samweb list-file "$BASE_QUERY2 and ischildof: ( data_tier raw AND Online.Subrun < 20)
269 1 Gavin Davies
</pre>
270 1 Gavin Davies
271 1 Gavin Davies
h2. Listing Files with no physical locations
272 1 Gavin Davies
273 1 Gavin Davies
*samweb list-files "$BASE_QUERY AND availability: virtual"*
274 1 Gavin Davies
275 1 Gavin Davies
<pre>
276 1 Gavin Davies
samweb list-files "$BASE_QUERY AND availability: virtual"
277 1 Gavin Davies
</pre>
278 1 Gavin Davies
279 1 Gavin Davies
280 1 Gavin Davies
h2. Listing Files with physical locations
281 1 Gavin Davies
282 1 Gavin Davies
*samweb list-files "$BASE_QUERY AND availability: physical"*
283 1 Gavin Davies
284 1 Gavin Davies
<pre>
285 1 Gavin Davies
samweb list-files "$BASE_QUERY AND availability: physical"
286 1 Gavin Davies
</pre>
287 1 Gavin Davies
288 1 Gavin Davies
h2. Retrieving Files with a physical location
289 1 Gavin Davies
290 1 Gavin Davies
You can retrieve files either individually or with a query pattern (multiple files).
291 1 Gavin Davies
292 1 Gavin Davies
h3. Retrieve a single file
293 1 Gavin Davies
294 1 Gavin Davies
*ifdh_fetch <filename>*
295 1 Gavin Davies
296 1 Gavin Davies
<pre>
297 1 Gavin Davies
ifdh_fetch fardet_r00012006_s61_t02.raw
298 1 Gavin Davies
</pre>
299 1 Gavin Davies
300 1 Gavin Davies
Note: you must have a valid certificate (i.e. run kx509)
301 1 Gavin Davies
302 1 Gavin Davies
h3. Retrieve a group of files
303 1 Gavin Davies
304 1 Gavin Davies
*ifdh_fetch `ifdh translateContraints <dimensions string>`*
305 1 Gavin Davies
306 1 Gavin Davies
<pre>
307 1 Gavin Davies
ifdh_fetch `ifdh translateConstraints "data_tier raw AND online.detector fardet and run_number 12006.51"`
308 1 Gavin Davies
</pre>
309 1 Gavin Davies
310 1 Gavin Davies
Note: Here ifdh is used to do the lookup of the files and then the resulting names are passed to the fetch.
311 1 Gavin Davies
312 1 Gavin Davies
h2. Verifying that your file was transfer correctly
313 1 Gavin Davies
314 1 Gavin Davies
Check the checksum against the tape copy (no json parser installed)
315 1 Gavin Davies
316 1 Gavin Davies
<pre>
317 1 Gavin Davies
# From Database
318 1 Gavin Davies
samweb get-metadata fardet_r00012006_s35_t02.raw | grep "Crc" | cut -d ':' -f 2 | cut -d ' ' -f 2
319 1 Gavin Davies
3828307205
320 1 Gavin Davies
# From file on disk
321 1 Gavin Davies
samweb file-checksum fardet_r00012006_s35_t02.raw | cut -d '"' -f 4
322 1 Gavin Davies
3828307205
323 1 Gavin Davies
</pre>
324 1 Gavin Davies
325 1 Gavin Davies
If you have a json parser available then just use that to parse the output instead of using "cut"
326 1 Gavin Davies
327 1 Gavin Davies
<pre>
328 1 Gavin Davies
samweb get-metadata fardet_r00012006_s35_t02.raw --json | jq '.crc.crc_value'
329 1 Gavin Davies
"3828307205"
330 1 Gavin Davies
</pre>
331 1 Gavin Davies
332 3 Adam Aurisano
h2. Finding projects run off of a dataset
333 3 Adam Aurisano
334 3 Adam Aurisano
If you need to determine who ran projects off a dataset, you can use:
335 3 Adam Aurisano
336 3 Adam Aurisano
<pre>
337 3 Adam Aurisano
samweb list-projects --defname=<defname>
338 3 Adam Aurisano
</pre>
339 3 Adam Aurisano
340 3 Adam Aurisano
This lists all projects run off a dataset.  Most project names start with the username of the person who created them, so generally, no further work is necessary.  If a project is listed whose creator is not obvious, you can use:
341 3 Adam Aurisano
342 3 Adam Aurisano
<pre>
343 3 Adam Aurisano
samweb project-summary <project name> | less
344 3 Adam Aurisano
</pre>
345 3 Adam Aurisano
346 3 Adam Aurisano
The first few lines of the output will tell you who the project creator was.
347 3 Adam Aurisano
348 1 Gavin Davies
h1. Advanced Recipes (Hollandaise sauce)
349 1 Gavin Davies
350 7 Alexander Himmel
h2. Recovering a whole project
351 1 Gavin Davies
352 1 Gavin Davies
<pre>
353 1 Gavin Davies
samweb project-recovery -e nova --useFileStatus=0 --useProcessStatus=0 gsdavies-RecoFDGENIE_S14-03-25.sh-20140418_1037
354 1 Gavin Davies
</pre>
355 1 Gavin Davies
which yields:
356 1 Gavin Davies
357 1 Gavin Davies
(snapshot_id 15312 minus (project_name gsdavies-RecoFDGENIE_S14-03-25.sh-20140418_1037 and consumed_status consumed))
358 1 Gavin Davies
359 1 Gavin Davies
<pre>
360 1 Gavin Davies
kx509
361 1 Gavin Davies
samweb create-definition <new_definition_name> "(snapshot_id 15312 minus (project_name gsdavies-RecoFDGENIE_S14-03-25.sh-20140418_1037 and consumed_status consumed))"
362 7 Alexander Himmel
</pre>
363 7 Alexander Himmel
364 7 Alexander Himmel
h2. Re-running only failed ("skipped") files from an existing project
365 7 Alexander Himmel
366 7 Alexander Himmel
<pre>
367 8 Rijeesh Keloth
samweb create-definition <project>_recovery "project_name <project> and consumed_status skipped"
368 1 Gavin Davies
</pre>
369 1 Gavin Davies
370 4 Andrew Norman
h2. Sampling/Prescaling a dataset
371 4 Andrew Norman
372 4 Andrew Norman
SAM provides a mechanism for deterministically sampling a dataset.  To do this:
373 4 Andrew Norman
374 4 Andrew Norman
* Define the dataset
375 4 Andrew Norman
* Define a new dataset with a stride and offset
376 4 Andrew Norman
377 4 Andrew Norman
<pre>
378 4 Andrew Norman
samweb create-definition my_dataset "<selection critera>"
379 4 Andrew Norman
samweb create-definition my_dataset_oneTenth_version1 "defname: my_dataset with stride 10 offset 0"
380 4 Andrew Norman
samweb create-definition my_dataset_oneTenth_version2 "defname: my_dataset with stride 10 offset 1"
381 4 Andrew Norman
...
382 4 Andrew Norman
</pre>
383 4 Andrew Norman
384 4 Andrew Norman
Each of these will create a dataset that is 1/10 the size of the original.  The offset parameter specifies where to start counting from (so offset 0 starts from the first element in the list, offset 1 starts from the second).
385 4 Andrew Norman
386 5 Adam Aurisano
h2. Adding additional constraints to a dataset
387 5 Adam Aurisano
388 5 Adam Aurisano
If a dataset already exists which contains all the files you want, but you want to add an additional constraint (for instance, you only want a specific run range) you can:
389 5 Adam Aurisano
390 5 Adam Aurisano
<pre>
391 6 Adam Aurisano
samweb create-definition new_dataset "defname: old_dataset and run_number >= startRun and run_number <= endRun"
392 5 Adam Aurisano
</pre>
393 5 Adam Aurisano
394 1 Gavin Davies
h2. Constructing a Good Runs List
395 1 Gavin Davies
396 4 Andrew Norman
h2. Pre-staging Data from Tape
397 4 Andrew Norman
398 4 Andrew Norman
If you need to prestage data from tape the way to do it is the following:
399 4 Andrew Norman
400 4 Andrew Norman
* Start a screen (or tmux) session
401 11 Jeremy Wolcott
* Do a "prestage-dataset" with your dataset name
402 4 Andrew Norman
* Detach from the screen session
403 4 Andrew Norman
* Come back in a few hours or days...
404 10 Alexander Himmel
* The command supports running parallel processes with @--parallel@. Recommended best practice is to do 1 dataset at a time with 4 parallel threads.
405 4 Andrew Norman
406 4 Andrew Norman
<pre>
407 4 Andrew Norman
screen
408 12 Erica Smith
samweb prestage-dataset --defname=fardet_onehertz_raw_Oct2014-May2015-10percent --parallel 4
409 1 Gavin Davies
</pre>
410 12 Erica Smith
411 13 Erica Smith
For more information on this process see the "How To Configure Production Jobs":https://cdcvs.fnal.gov/redmine/projects/nova-production/wiki/How_to_Configure_Production_Jobs page on the production wiki.
412 1 Gavin Davies
413 1 Gavin Davies
h1. Combining it all (Eggs Benedict)