Project

General

Profile

DM - Expert Documentation » History » Version 69

Michael Kirby, 11/20/2019 12:45 PM

1 1 Michael Kirby
{{>toc}}
2 54 Lu Ren
3 54 Lu Ren
h1. DM - Expert Documentation
4 1 Michael Kirby
5 68 Michael Kirby
h2. Tasks for DM Expert Shifts
6 68 Michael Kirby
7 68 Michael Kirby
h3. Starting the PUBS daemon running
8 68 Michael Kirby
9 68 Michael Kirby
*%{color:red}The daemons should be restarted every Monday-Wednesday-Friday-Sunday, and PUBS logs rotated every Monday.%*
10 68 Michael Kirby
11 68 Michael Kirby
Details are on this page. [[Starting the PUBS online daemon]]
12 68 Michael Kirby
13 68 Michael Kirby
h3. Check to make sure there are no *hidden* errors in the PUBS GUI.
14 68 Michael Kirby
15 68 Michael Kirby
On Monday at the start of shift, the DM Expert should start the "PUBS GUI":https://cdcvs.fnal.gov/redmine/projects/uboone-operations/wiki/DM_-_Shifters#2-PUBS-project-status-and-restarting-PUBS-GUI-follow-these-instructions-to-make-sure-that-PUBS-projects-are-actively-running-on-ubdaq-prod-evb-and-ubdaq-prod-near1 Once the GUI is up and running, click on the "Use Relative Counters" to get the absolute numbers of successful and failed processing for each PUBS project. If there are any residual errors, correct them and make an e-log entry.
16 68 Michael Kirby
17 68 Michael Kirby
h3. Correcting Errors in PUBS
18 68 Michael Kirby
19 68 Michael Kirby
* Querying DB for errors. [[DB Query]]
20 68 Michael Kirby
21 68 Michael Kirby
* Errors in *Metadata Generation* From Incomplete Files [[Correcting Failed Metadata Generation]]
22 68 Michael Kirby
23 68 Michael Kirby
* Failed *Near1 Binary Transfers* [[Correcting Failed Near1 Binary Transfer]]
24 68 Michael Kirby
25 68 Michael Kirby
* Errors in *Registering File Metadata* and crontab entries for kerberos tickets and grid proxies [[Correcting Failed Metadata Registration]]
26 68 Michael Kirby
27 69 Michael Kirby
h3. Reconfigure PUBS if there is a dCache or network downtime
28 69 Michael Kirby
29 69 Michael Kirby
[[What to do if dCache/enstore go down (no access to pnfs area)]]
30 69 Michael Kirby
31 69 Michael Kirby
h3. If there are errors for online servers (EVB, NEAR1, or WS02) running out of Disk Space?
32 69 Michael Kirby
33 69 Michael Kirby
* [[on ubdaq-prod-evb]]
34 69 Michael Kirby
* [[on near1 (/datalocal/)]]
35 69 Michael Kirby
* [[on sebXX (uB_DataMgmt_PCXX_seb06_data/disk_occ)]]
36 69 Michael Kirby
37 69 Michael Kirby
h3. Respond to collaborator requests to prevent the deletion of one or more SN runs.
38 69 Michael Kirby
39 69 Michael Kirby
To prevent the deletion of one or more runs in the SN stream login as uboonepro. Head over the to the SN PUBS script directory located here /home/uboonepro/pubs/dstream_online/snova. Here you will find "frozen_runs.txt". In this file insert *new line separated* run numbers. The monitoring script will read this ASCII text file, and prevent the deletion of files in this text file.
40 69 Michael Kirby
41 30 Michael Kirby
h2. Documentation!
42 30 Michael Kirby
43 4 Michael Kirby
... is finally initiated by David Caratelli [[:File:PUBS.pdf|PUBS.pdf]]  https://www.overleaf.com/3384459hxbyns#/9541217/
44 1 Michael Kirby
45 4 Michael Kirby
For super-duper experts, if interested in, PUBS base framework documentation is on "DocDB 5400":http://microboone-docdb.fnal.gov:8080/cgi-bin/ShowDocument?docid=5400
46 3 Michael Kirby
47 21 David Caratelli
Keep updated! Also attach the latest version to this Wiki.
48 22 David Caratelli
49 21 David Caratelli
50 21 David Caratelli
h2. Moving all projects to a single online machine
51 18 Michael Kirby
52 50 Lu Ren
Details are on this page. [[Running all PUBS projects on single server]]
53 23 Michael Kirby
54 58 Lu Ren
h2. Building up the PUBS online testbed
55 58 Lu Ren
56 58 Lu Ren
Details are on this page. [[Building up the PUBS online testbed]]
57 59 Lu Ren
58 58 Lu Ren
h2. Mapping project name to names on GUI.
59 58 Lu Ren
60 45 Lu Ren
How do I find the project name (database table name) given the name of a specific box on the monitoring gui? [[Project GUI Map]]
61 31 Victor Genty
62 32 Victor Genty
h2. Changing the Database Configuration for Online PUBS [[Online PUBS Database Reconfig]]
63 32 Victor Genty
64 33 Victor Genty
h2. Expired Certificate on Near1 "Request OSG Production Service Certificate":https://cdcvs.fnal.gov/redmine/projects/uboonecode/wiki/CSR
65 35 Victor Genty
66 35 Victor Genty
h2. The daemon on ws02 has mysteriously died.
67 35 Victor Genty
68 35 Victor Genty
DM experts are currently debugging an issue related to the daemon on ws02 being killed by the kernel. If you are a DM expert on shift and you find the ws02 daemon has mysteriously died. Please execute the following command to copy the log files to a safe location, then please restart the daemon.
69 35 Victor Genty
<pre>
70 35 Victor Genty
mkdir -p /data/uboonepro/ws02_daemon_failures/`date +%D`; cp /home/uboonepro/pubs/log/ubdaq-prod-ws02.fnal.gov/* /data/uboonepro/ws02_daemon_failures/`date +%D`/
71 1 Michael Kirby
</pre>
72 58 Lu Ren
73 67 Michael Kirby
h2. [[DM - Work Notes]] - Archived DM Work Notes
74 67 Michael Kirby
75 58 Lu Ren
Questions? "Ask Kirby":mailto:kirby@fnal.gov