Project

General

Profile

Previous Page Main Page What to do while on shift DAQ Trouble Shooting Guide

Restart the DCMs

The Data concentrator modules are small single board computers that are physically located on the detector. Each one can be rebooted using either a hard or soft reset. In both cases, the firmware that is currently loaded into any attached Front End Boards will be cleared, but only in a HARD reset will the firmware that is loaded into the DCM itself be forcible cleared out.

Most DCM related DAQ failures result from some combination of firmware or kernel level driver error and are most easily cured by resetting the DCM and reloading all the associated kernel modules and firmware. PLEASE REMEMBER!!!!! We like to minimize the amount of power-cycling of the dcms. Please try to reboot the DCM's first before you do this. See here (halfway down) or below for how to reboot the dcm's instead of doing a power-cycle.

ALSO -- it is really preferred, no matter if you are doing just a reboot or a power cycle, to ONLY do it to the dcms that have given you errors and you think are out-of-wack. So, if you know only one dcm is broken, just reboot that one. If that doesn't work, just power-cycle that one.

UPDATE -- right now we are running in a mode where we skip some DCM's in the timing chain (i.e. dcm 3-2 and 3-3). IF you ever need to reboot or powercycle ALL the dcm's (which we are really really suggesting you don't do) then there is an extra step required to reimplement this skipping of dcms. After all the dcm's are restarted, you need to click on the "Bypass DCM Timing" icon on the lower left screen of nova-daq-1, seen here:

Then things should work again.

To perform a HARD remote restart the DCMs:

  1. Using the same terminal that is used for issuing Run Control commands, first perform a clean shutdown of Runcontrol
    stopRunControl.sh
    stopSystem
    
  2. Next perform a remote power cycle of the DCMs. power-off-instrumented powers off all the instrumented DCMs (i.e. all except 3-2 and 3-3), the corresponding power-off-1-1 command would power off only the DCM in the Diblock 1 Position 1 location. There is obviously also power-off-1-2, etc. Please choose appropriately.
    power-off-instrumented && echo WAITING 45 seconds && sleep 45 && echo POWERED OFF && echo 
                                      *OR*
    power-off-1-1 && echo WAITING 45 seconds && sleep 45 && echo POWERED OFF && echo
    
  3. Wait approximately 10s for the Low Voltage monitor on the Slow Control computer to show that the voltages have gone to zero and to allow the DCMs and FEBs to fully discharge.
  4. power-off-instrumented does not turn off DCMs that are bypassed for timing. Those have to be powered off using power-off-3-2 and power-off-3-3, alternatively, you can use power-off-all to power-cycle ALL DCMs.
  5. Bring the power back up on the DCMs and FEBs.
    power-on-instrumented && echo WAITING 2 minutes && sleep 120 && echo POWERED ON && echo 
                                      *OR*
    power-on-1-1 && echo WAITING 2 minutes && sleep 120 && echo POWERED ON && echo
    
  6. Verify that the power is restored. The nominal voltages that are applied to the systems are 24V and 3.5V.
  7. Wait for the DCMs to complete their reinitialization and fully reboot. This process takes approximate 30s-1min, but it is best to wait approximately 2 minutes to ensure all the systems are fully restored.
  8. Restart the DAQ subsystems on the DCMs
    startSystem
    checkSystem
    
  9. If the DAQ subsystems are restarted and check out, restart Run Control
    startRunControl.sh
    
  10. Then follow the second part of the procedure on Start a data taking run

If you do NOT need to power-cycle the DCMs and only need to reboot them (this is almost always be tried first), at the point after stopSystem, instead of issuing the two power commands, instead:

A reboot through DAQApplicationManager GUI -- see DAQApplicationManager here for instructions at the bottom of the page.

OR if you don't want to use the GUI but want to use the command line instead, you can:

reboot-dcm-all && echo WAITING 2 minutes && sleep 120 && echo REBOOTED && echo  
                                *OR*
reboot-dcm-1-1 && echo WAITING 2 minutes && sleep 120 && echo REBOOTED && echo

Follow this with a

check-dcm-all 
    *OR*
check-dcm-1-1

Even if you only rebooted one dcm, it is perfectly fine to check them all before continuing. Also, you are free to check dcms after a power cycle with the exact same command. Then one can continue with startSystem. See here for what the output of check-dcm-all should look like if it works correctly.

Previous Page Main Page What to do while on shift DAQ Trouble Shooting Guide