DM - Expert Documentation » History » Version 61

Lu Ren, 08/22/2018 12:23 AM

1 1 Michael Kirby
2 54 Lu Ren
3 54 Lu Ren
h1. DM - Expert Documentation
4 1 Michael Kirby
5 1 Michael Kirby
h2. Documentation!
6 8 Kazuhiro Terao
7 53 Lu Ren
... is finally initiated by David Caratelli :)
8 8 Kazuhiro Terao
9 51 Lu Ren
For super-duper experts, if interested in, PUBS base framework documentation is on "DocDB 5400":
10 17 Kazuhiro Terao
11 8 Kazuhiro Terao
Keep updated! Also attach the latest version to this Wiki.
12 8 Kazuhiro Terao
13 1 Michael Kirby
h2. Starting the PUBS daemon running
14 2 Michael Kirby
15 34 Afroditi Papadopoulou
*%{color:blue}The daemons should be restarted every Monday-Wednesday-Friday-Sunday.%*
16 34 Afroditi Papadopoulou
17 3 Michael Kirby
Details are on this page. [[Starting the PUBS online daemon]]
18 2 Michael Kirby
19 30 Michael Kirby
h2. Moving all projects to a single online machine
20 30 Michael Kirby
21 30 Michael Kirby
Details are on this page. [[Running all PUBS projects on single server]]
22 30 Michael Kirby
23 4 Michael Kirby
h2. Building up the PUBS online testbed
24 1 Michael Kirby
25 4 Michael Kirby
Details are on this page. [[Building up the PUBS online testbed]]
26 3 Michael Kirby
27 21 David Caratelli
h2. Mapping project name to names on GUI.
28 22 David Caratelli
29 21 David Caratelli
How do I find the project name (database table name) given the name of a specific box on the monitoring gui? [[Project GUI Map]]
30 21 David Caratelli
31 26 Michael Kirby
h2. Changing the Database Configuration for Online PUBS [[Online PUBS Database Reconfig]]
32 1 Michael Kirby
33 1 Michael Kirby
h2. Correcting Errors in PUBS
34 57 Lu Ren
35 57 Lu Ren
* Querying DB for errors. [[DB Query]]
36 1 Michael Kirby
37 48 Lu Ren
* Errors in *Metadata Generation* From Incomplete Files [[Correcting Failed Metadata Generation]]
38 25 Michael Kirby
39 48 Lu Ren
*  Failed *Near1 Binary Transfers* [[Correcting Failed Near1 Binary Transfer]]
40 25 Michael Kirby
41 1 Michael Kirby
* Errors in *Registering File Metadata* and crontab entries for kerberos tickets and grid proxies [[Correcting Failed Metadata Registration]]
42 48 Lu Ren
43 18 Michael Kirby
44 50 Lu Ren
h2. Expired Certificate on Near1 "Request OSG Production Service Certificate":
45 23 Michael Kirby
46 58 Lu Ren
h2. Running out of Disk Space?
47 58 Lu Ren
48 58 Lu Ren
* [[on ubdaq-prod-evb]]
49 59 Lu Ren
* [[on near1 (/datalocal/)]]
50 58 Lu Ren
* [[on sebXX (uB_DataMgmt_PCXX_seb06_data/disk_occ)]]
51 58 Lu Ren
52 7 David Caratelli
h2. Running out of Disk Space on /datalocal/ @ near1 ?
53 7 David Caratelli
54 7 David Caratelli
If the disk-usage @ /datalocal/ is above 95% as an immediate action please stop the "mv_binary_evb" project. Notify the PUBS team that you just did this and start addressing the disk-space issue
55 9 David Kaleko
56 45 Lu Ren
h2. [[What to do if dCache/enstore go down (no access to pnfs area)]]
57 31 Victor Genty
58 31 Victor Genty
h2. Running out of Disk Space on sebXX? uB_DataMgmt_PCXX_seb06_data/disk_occ
59 31 Victor Genty
60 31 Victor Genty
This is a super nova stream related issue. The super nova PUBS projects are located on ws02. Please restart the daemon on ws02. The error will be cleared after ~15min. 
61 31 Victor Genty
62 31 Victor Genty
Further notes: this particular error can happen when on of the PUBS projects for the SNS (super nova stream) gets stuck. We are using a cpulimiter to keep the load down on the ws02 machine. Something the cpulimiter can hang one of the projects and cause new incoming SNS file registration to halt. Restarting the daemon will kill and refresh the PUBS projects.
63 31 Victor Genty
64 32 Victor Genty
h2. Collaborator has asked me, the DM expert, to prevent the deletion of one or more SN runs.
65 32 Victor Genty
66 33 Victor Genty
To prevent the deletion of one or more runs in the SN stream login as uboonepro. Head over the to the SN PUBS script directory located here /home/uboonepro/pubs/dstream_online/snova. Here you will find "frozen_runs.txt". In this file insert *new line separated* run numbers. The monitoring script will read this ASCII text file, and prevent the deletion of files in this text file.
67 35 Victor Genty
68 35 Victor Genty
h2. The daemon on ws02 has mysteriously died.
69 35 Victor Genty
70 35 Victor Genty
DM experts are currently debugging an issue related to the daemon on ws02 being killed by the kernel. If you are a DM expert on shift and you find the ws02 daemon has mysteriously died. Please execute the following command to copy the log files to a safe location, then please restart the daemon.
71 35 Victor Genty
72 35 Victor Genty
mkdir -p /data/uboonepro/ws02_daemon_failures/`date +%D`; cp /home/uboonepro/pubs/log/* /data/uboonepro/ws02_daemon_failures/`date +%D`/
73 1 Michael Kirby
74 58 Lu Ren
75 58 Lu Ren
Questions? "Ask Kirby"