Project

General

Profile

Daily checks for offline PUBS projects.

To to the following actions, log in to uboonegpvmXX as user uboonepro.

  • Make sure daemon is running (do for each uboonegpvmXX that has a daemon running).
    • To check if a daemon is running on uboonegpvmXX, log in to that node and initialize pubs. Then type
      daemon.sh status
      
    • To start a daemon on uboonegpvmXX, log in to that node and initialize pubs using the correct initialization script for that node. To start the daemon, type the following commands.
      cd $PUB_TOP_DIR
      daemon.sh start
      
  • Check for held batch jobs, which usually means that a job has exceeded its memory limit. Such jobs should be killed manually.
    kill_held_jobs.sh
    
  • Check log files.
    cd $PUB_TOP_DIR/log/uboonegpvmXX.fnal.gov
    
    • Check that the time stamps of log files for running projects, daemon log file (proc_daemon.log) and joblist.txt are recent (say within one hour). Log files should update, even if there is nothing to do. If a log file is not updating, figure out why.
    • Look at recent entries of project log files and daemon log file. Be suspicious if you see ERROR messages.
  • Check status of batch jobs.
    jobsub_q --user=uboonepro
    
    • Make sure you see about the exepcted number of batch jobs running.
  • Check the FTS status web page, make sure things look OK. From time to time, expand the "Failed transfers" section and click on "Retry all" button.