- Table of contents
- DM - Expert Documentation
- Tasks for DM Expert Shifts
- Starting the PUBS daemon running
- Check to make sure there are no hidden errors in the PUBS GUI.
- Correcting Errors in PUBS
- Reconfigure PUBS if there is a dCache or network downtime
- If there are errors for online servers (EVB, NEAR1, or WS02) running out of Disk Space?
- Respond to collaborator requests to prevent the deletion of one or more SN runs.
- Tasks for DM Expert Shifts
- Documentation and rarely needed procedures
- Mapping project name to names on GUI.
- Moving all projects to a single online machine
- Changing the Database Configuration for Online PUBS Online PUBS Database Reconfig
- Expired Certificate on Near1
- The daemon on ws02 has mysteriously died.
- Building up the PUBS online testbed
- DM - Work Notes - Archived DM Work Notes
DM - Expert Documentation¶
Tasks for DM Expert Shifts¶
Starting the PUBS daemon running¶
The daemons should be restarted every Monday-Wednesday-Friday-Sunday, and PUBS logs rotated every Monday.
Details are on this page. Starting the PUBS online daemon
Check to make sure there are no hidden errors in the PUBS GUI.¶
On Monday at the start of shift, the DM Expert should start the PUBS GUI Once the GUI is up and running, click on the "Use Relative Counters" to get the absolute numbers of successful and failed processing for each PUBS project. If there are any residual errors, correct them and make an e-log entry.
Correcting Errors in PUBS¶
- Errors in Metadata Generation From Incomplete Files Correcting Failed Metadata Generation
- Failed Near1 Binary Transfers Correcting Failed Near1 Binary Transfer
- Errors in Registering File Metadata and crontab entries for kerberos tickets and grid proxies Correcting Failed Metadata Registration
- Querying DB for errors. DB Query
Reconfigure PUBS if there is a dCache or network downtime¶
If there are errors for online servers (EVB, NEAR1, or WS02) running out of Disk Space?¶
Respond to collaborator requests to prevent the deletion of one or more SN runs.¶
To prevent the deletion of one or more runs in the SN stream login as uboonepro. Head over the to the SN PUBS script directory located here /home/uboonepro/pubs/dstream_online/snova. Here you will find "frozen_runs.txt". In this file insert new line separated run numbers. The monitoring script will read this ASCII text file, and prevent the deletion of files in this text file.
Documentation and rarely needed procedures¶
For super-duper experts, if interested in, PUBS base framework documentation is on DocDB 5400
Keep updated! Also attach the latest version to this Wiki.
Mapping project name to names on GUI.¶
How do I find the project name (database table name) given the name of a specific box on the monitoring gui? Project GUI Map
Moving all projects to a single online machine¶
Details are on this page. Running all PUBS projects on single server
Changing the Database Configuration for Online PUBS Online PUBS Database Reconfig¶
Expired Certificate on Near1¶
If this happens, the server won't have a valid proxy and so files won't register to SAM. You'll see these problems in the "PUB_LOGGER_FILE_LOCATION/proc_daemon.log" logs with warnings about "expired proxy". Contact Mine Altunay <firstname.lastname@example.org>, Jeny Teheran <email@example.com>, and Mike Kirby <firstname.lastname@example.org> to get a new certificate issued.
The daemon on ws02 has mysteriously died.¶
DM experts are currently debugging an issue related to the daemon on ws02 being killed by the kernel. If you are a DM expert on shift and you find the ws02 daemon has mysteriously died. Please execute the following command to copy the log files to a safe location, then please restart the daemon.
mkdir -p /data/uboonepro/ws02_daemon_failures/`date +%D`; cp /home/uboonepro/pubs/log/ubdaq-prod-ws02.fnal.gov/* /data/uboonepro/ws02_daemon_failures/`date +%D`/
Building up the PUBS online testbed¶
Details are on this page. Building up the PUBS online testbed
DM - Work Notes - Archived DM Work Notes¶
Questions? Ask Kirby