Project

General

Profile

DM - Expert Documentation » History » Version 48

Lu Ren, 08/21/2018 11:48 PM

1 3 Michael Kirby
{{>toc}}
2 3 Michael Kirby
3 1 Michael Kirby
h1. DM - Expert Documentation
4 1 Michael Kirby
5 8 Kazuhiro Terao
h2. Documentation!
6 8 Kazuhiro Terao
7 8 Kazuhiro Terao
... is finally initiated by David Caratelli :) 
8 8 Kazuhiro Terao
https://www.overleaf.com/3384459hxbyns#/9541217/
9 8 Kazuhiro Terao
10 17 Kazuhiro Terao
For super-duper experts, if interested in,
11 17 Kazuhiro Terao
PUBS base framework documentation is on "DocDB 5400":http://microboone-docdb.fnal.gov:8080/cgi-bin/ShowDocument?docid=5400
12 17 Kazuhiro Terao
13 8 Kazuhiro Terao
Keep updated! Also attach the latest version to this Wiki.
14 8 Kazuhiro Terao
15 1 Michael Kirby
h2. Starting the PUBS daemon running
16 2 Michael Kirby
17 34 Afroditi Papadopoulou
*%{color:blue}The daemons should be restarted every Monday-Wednesday-Friday-Sunday.%*
18 34 Afroditi Papadopoulou
19 3 Michael Kirby
Details are on this page. [[Starting the PUBS online daemon]]
20 2 Michael Kirby
21 30 Michael Kirby
h2. Moving all projects to a single online machine
22 30 Michael Kirby
23 30 Michael Kirby
Details are on this page. [[Running all PUBS projects on single server]]
24 30 Michael Kirby
25 4 Michael Kirby
h2. Building up the PUBS online testbed
26 1 Michael Kirby
27 4 Michael Kirby
Details are on this page. [[Building up the PUBS online testbed]]
28 3 Michael Kirby
29 21 David Caratelli
h2. Mapping project name to names on GUI.
30 22 David Caratelli
31 21 David Caratelli
How do I find the project name (database table name) given the name of a specific box on the monitoring gui? [[Project GUI Map]]
32 21 David Caratelli
33 20 David Caratelli
h2. Querying DB for errors. [[DB Query]]
34 20 David Caratelli
35 20 David Caratelli
h2. Project Debugging Home-Page (list of projects and debug info for each one). [[Project Debug]]
36 20 David Caratelli
37 26 Michael Kirby
h2. Changing the Database Configuration for Online PUBS [[Online PUBS Database Reconfig]]
38 26 Michael Kirby
39 47 Lu Ren
h2. Correcting Errors in PUBS
40 1 Michael Kirby
41 48 Lu Ren
* Errors in *Metadata Generation* From Incomplete Files [[Correcting Failed Metadata Generation]]
42 25 Michael Kirby
43 48 Lu Ren
> This is a problem with incomplete files that have less than one event.
44 1 Michael Kirby
45 48 Lu Ren
*  Failed *Near1 Binary Transfers* [[Correcting Failed Near1 Binary Transfer]]
46 25 Michael Kirby
47 48 Lu Ren
> If files transferred from EVB to Near1 fail to transfer to the FTS dropbox, errors will appear in the Near1 Binary Transfer box on PUBS.
48 19 Michael Kirby
49 48 Lu Ren
* Errors in *Registering File Metadata* and crontab entries for kerberos tickets and grid proxies [[Correcting Failed Metadata Registration]]
50 46 Lu Ren
51 48 Lu Ren
> When there are SSL problems registering file metadata into the SAM database or missing crontab entries.
52 18 Michael Kirby
53 23 Michael Kirby
h2. Expired Certificate on Near1
54 23 Michael Kirby
55 23 Michael Kirby
https://cdcvs.fnal.gov/redmine/projects/uboonecode/wiki/CSR
56 23 Michael Kirby
57 5 David Caratelli
h2. Running out of Disk Space on ubdaq-prod-evb ?
58 5 David Caratelli
59 5 David Caratelli
useful info: there are ~ 33 TB of disk space in /data/ on the evb machine. PUBS will try and clear data in /data/uboonedaq/TestRuns/ until the disk-usage reaches 40% of /data/uboonedaq/TestRuns/ is empty.
60 5 David Caratelli
61 5 David Caratelli
If this is the case there are several things one should do:
62 5 David Caratelli
0) Idenfity who is using up the disk space. Options:
63 6 David Caratelli
--> a) /data/uboonedaq/rawdata/  -> this is where data from "official" runs goes. Files here are seen (and should be eventually removed) by PUBS.
64 6 David Caratelli
--> b) /data/uboonedaq/TestRuns/ -> this is disk-space DAQ people use to test things. It is not seen by PUBS and needs to be removed by hand in order to be cleared.
65 6 David Caratelli
--> c) /data/uboonedaq/lukhanin/ -> test-space for Gennadiy. Also needs to be removed manually in order to free up space.
66 6 David Caratelli
--> d) /data/OTHER/              -> data used by someone else.
67 5 David Caratelli
If most of the space is not being used by /data/uboonedaq/rawdata/ we need to free space manually. If it is urgent to free up space (i.e. data-taking should not be interrupted and the disk will fill up rather soon) you are authorized to clear /data/uboonedaq/TestRuns/. Contact any other person who is using up a considerable amount of space and ask them to quickly remove contents in their /data/ folder.
68 5 David Caratelli
If /data/uboonedaq/rawdata/ is using up a significant amount of space, the problem is probably PUBS' fault.
69 5 David Caratelli
1) identify the cause of the problem. Why is disk space not being freed? Possible causes:
70 6 David Caratelli
--> a) clear_binary_evb is having issues.
71 6 David Caratelli
--> b) clear_binary_evb does not find any new files to clear. This indicates a possible problem with one of the projects that clear_binary_evb depends on. A possible cause could be poor network speed to drain data out of the evb machine.
72 2 Michael Kirby
73 2 Michael Kirby
Questions? "Ask Kirby":mailto:kirby@fnal.gov
74 7 David Caratelli
75 7 David Caratelli
h2. Running out of Disk Space on /datalocal/ @ near1 ?
76 7 David Caratelli
77 7 David Caratelli
If the disk-usage @ /datalocal/ is above 95% as an immediate action please stop the "mv_binary_evb" project. Notify the PUBS team that you just did this and start addressing the disk-space issue
78 9 David Kaleko
79 45 Lu Ren
h2. [[What to do if dCache/enstore go down (no access to pnfs area)]]
80 31 Victor Genty
81 31 Victor Genty
h2. Running out of Disk Space on sebXX? uB_DataMgmt_PCXX_seb06_data/disk_occ
82 31 Victor Genty
83 31 Victor Genty
This is a super nova stream related issue. The super nova PUBS projects are located on ws02. Please restart the daemon on ws02. The error will be cleared after ~15min. 
84 31 Victor Genty
85 31 Victor Genty
Further notes: this particular error can happen when on of the PUBS projects for the SNS (super nova stream) gets stuck. We are using a cpulimiter to keep the load down on the ws02 machine. Something the cpulimiter can hang one of the projects and cause new incoming SNS file registration to halt. Restarting the daemon will kill and refresh the PUBS projects.
86 31 Victor Genty
87 32 Victor Genty
h2. Collaborator has asked me, the DM expert, to prevent the deletion of one or more SN runs.
88 32 Victor Genty
89 33 Victor Genty
To prevent the deletion of one or more runs in the SN stream login as uboonepro. Head over the to the SN PUBS script directory located here /home/uboonepro/pubs/dstream_online/snova. Here you will find "frozen_runs.txt". In this file insert *new line separated* run numbers. The monitoring script will read this ASCII text file, and prevent the deletion of files in this text file.
90 35 Victor Genty
91 35 Victor Genty
h2. The daemon on ws02 has mysteriously died.
92 35 Victor Genty
93 35 Victor Genty
DM experts are currently debugging an issue related to the daemon on ws02 being killed by the kernel. If you are a DM expert on shift and you find the ws02 daemon has mysteriously died. Please execute the following command to copy the log files to a safe location, then please restart the daemon.
94 35 Victor Genty
<pre>
95 35 Victor Genty
mkdir -p /data/uboonepro/ws02_daemon_failures/`date +%D`; cp /home/uboonepro/pubs/log/ubdaq-prod-ws02.fnal.gov/* /data/uboonepro/ws02_daemon_failures/`date +%D`/
96 35 Victor Genty
</pre>