Project

General

Profile

Some details of how DQM is set up

Are your rsync and the cerberus twins failing? 2017.12.30

Sometimes user lariatraw@lariat-daq00's rsync fails from lariat-daq00:/daqdata/dropbox to lariat-daq01:/daqdata/dropbox.
  • Find and kill all the DataSync and rsync processes:
    ps aux | grep -i sync
    
  • Then restart. The command is usually in the history:
    $HOME/bin/runDataSync.sh >& /daqdata/log/rsync/DataSync.log &
    

And then, cerberus might be dead. Should be running on both daq00 and daq01. Zombie processes might need slaying, all with the string supervisord in them.

Restore cleanly on both machines with

source $HOME/app/dqm-v2/setup && cd $HOME/app/dqm-v2

then run
./start-daq00 on daq00 and ./start-daq01 on daq01

Rsync details

older

User lariatraw on lariat-daq03 is running rsync, from bootup. Check it:

ps -U lariatraw -u lariatraw u

Q: why does it have to rsync to both 01 and 04?
A: both machines process the files
there is a job manager that assigns a job to a machine
so that all the jobs don't end up on a single machine
Q: Oh ok, nifty
A: there are 2 worker nodes on daq01 for jobs
and 4 on daq04
daq01 also serves lariat-dqm.fnal.gov