Project

General

Profile

Manually starting or restarting the Beam Data Process (BDP) server.

Normal starting:

Normally a shifter can start the BDP server by double-clicking the StartBDP icon on minos-acnet.
This needs to be done if minos-beamdata or the ACNET servers have been restarted.
Restarting when BDP is already running should not pose a problem.

Abnormal (re)starting:

Very rarely the BDP server will hang.
This typically happens when there has not been any beam for some time, or when there have been severe network problems.

The solution is to manually kill the BDP server and the rotorooter.

However, sometimes the BDP server is fine, but the "big green button" incorrectly shows a problem.
So, you should first investigate to see if the BDP server is okay or not, as shown below

Investigating problems:

  1. Log in to the server:
  2. Confirm that the process truly is frozen by running
    • check_bdp
      It will show you:
      • The recently written data files. Check the time stamp of the one
        pointed to by "currentfile". Is it recent?
      • The "ps" listing of the bdp-server and rotorooter. Are both shown
        to be running?
      • The time since last spill (G:EA9SNC) and the listing of the current file.
    • Is the time from last spill consistent with current spill rate (should be on order of 2 seconds)?
      If so, is the file growing?
    • If the file is getting larger then there is probably no problem with the BDP server.
      If it is not growing but the G:EA9SNC number is large then there probably is no problem either.
      Otherwise, the BDP server is hung and should be killed and restarted.
  3. Killing and restarting:
    • Shut down rotoroot and bdp-server
      • shutdown_bdp
    • Restart the BDP processes
      • start_bdp or from the minos-acnet desktop icon
    • That should be it but you can recheck if desired:
      • check_bdp
  4. Rounding up stray files
    Sometimes a file will fail to close, growing to over the usual 200-some MBytes .
    You need to kill and restart the server as noted above.\
    Then create the files which cause the archiver and dbu datbase updated to handle the file.
    • FILE=FILENAME # something like B111215_080001.mbeam.root
    • ln -s /data/bdpdcp/data-files/${FILE} /data/bdpdcp/dbu/data-to-dbu/${FILE}
    • echo /data/bdpdcp/data-files/${FILE} > /data/bdpdcp/archiver/data-to-archive/${FILE}

Data recovery from minos-beamdata2

For experts only ... do not try this at home !

Sometimes we determine that minos-beamdata2 has good data that was missed on minos-beamdata.
This may involve checking the bdp log, OUTFILE=/home/minos/run/bdp/bdp-server.out

In this case we copy the good file from minos-beamdata2, changing the name if necessary to avoid conflicts.
But as of 2012, minos-beamdata has severe problems when data is written to disk over 100 KBytes/sec.

So we copy the file first to a web server,
then copy to minos-beamdata using curl with a limited rate :

  • mindata@minos27
    FILE=B120217_080001.mbeam.root # file name on minos-beamdata2
    FILE2=B120217_080002.mbeam.root # file name on minos-beamdata, perhaps different than FILE
    scp -c blowfish minos@minos-beamdata2:/data/bdpdcp/data-files/${FILE} \
        /nusoft/app/web/htdoc/minos/maint/${FILE2}
    
  • minos@minos-beamdata
    FILE2=B120217_080002.mbeam.root
    cd /data/bdpdcp/data-files
    curl --limit-rate  100K \
      http://nusoft.fnal.gov/minos/maint/${FILE2} -o ${FILE2}
    date
    ln -s /data/bdpdcp/data-files/${FILE2}   /data/bdpdcp/dbu/data-to-dbu/${FILE2}
    echo  /data/bdpdcp/data-files/${FILE2} > /data/bdpdcp/archiver/data-to-archive/${FILE2}
    
    
    • Watch the system with top during the copy
      • You should see up to 50% CPU usage by bdp-server
      • You should see low wait I/O, perhaps with peaks to 30%
  • If you are 10 minutes away from the *:10 crontab run of dbu, and you are impatient, you may run dbu by hand
    date
    /bin/bash /home/minos/BD/dbu/BeamDataDbi/scripts/run_bdbu_fnal_cron.sh
    date
    

Revised 2012 03 08 kreymer