Project

General

Profile

Previous Page Main Page What to do while on shift DAQ Trouble Shooting Guide

Start a data taking run

updated 2012-09-06

Contents

Overview

These instructions are for starting a run when Run Control hasn't been started, or is in an uncooperative state from a previous run. For starting a new run after cleanly finishing another, see here.

Under normal circumstances, skip down to the Starting Applications section below

First make sure that all of the hardware is powered on. This includes the data concentrator modules, timing distribution units and front end boards. Check the Detector Controls monitoring interface for this information.

NOTE: Start, stops and restarts of the system can now be handled through the DAQ Application Manager.

UPDATE (makes most sense to people who have shifted previously): We DO NOT like to powercycle dcm's. We only do it when we have exhausted our other options. The way we like to try to get things to work:Restarting a run within Run Control (RC).
If something goes wrong while running, one should try from RC to "End run", "Break Connections", "Detach from Partitions" (try a reset in the execute line if these steps fail). Then from Application Manager restart the process which is causing trouble and proceed with run taking. This option doesn't work frequently so its not mandatory to try this if you have had bad experiences with it. Next, we like to do a stopRunControl.sh, stopSystem, startSystem, startRunControl.sh (these are explained fully below). If that doesn't work, try to reboot dcm's. One can either just try to reboot the problematic dcm's or they can do all at once. Rebooting a single DCM can be done by right clicking on the DCM in question in the Application Manager and selecting the reboot option. If this option is unavailable a power cycle is most likely needed. Additionally all dcms can be rebooted by doing a stopSystem followed reboot-dcm-all from the command line. Then do startSystem, etc. If this doesn't work, see experts because it is unexpected behavior. Next steps will likely be power-cycle the dcms or dealing with the TDU, etc. Also, please see DAQApplicationManager for tips on using this GUI to tell what is going wrong and for how to restart a DCM through the GUI instead of doing it by command line. If problems persist, following these reboots and power cycling steps DDS may need a restart. This is done by running stopDDSEverywhere.sh from the command line after a stopSystem. startDDSEverwhere.sh will restart this server and then normal run start procedure can be followed.

Return to top of page

Re/Starting Applications

  • Initial State: Various
  • Final State: Applications (except Resource Manager, Run Control) running and ready for initialization.
The path to getting applications ready depends on where you are starting from:

Recovery from a badly-ended run

  • Initial State: Applications or Run Control non-responsive or in an un-recoverable error state.
  • Final State: Applications (except Resource Manager, Run Control, Message Viewer) running and ready for initialization.
On novadaq-ctrl-master, execute the following commands in a terminal as user "novadaq"
  1. If you need a new terminal window:
    1. open it from the tool bar, then type "ssh master" (without the quotes)
    2. source /home/novadaq/DAQOperationsTools/novadaq_setup.sh (aliased as setup_online and you only need to do this in the beginning once after the terminal is opened)
  2. stopRunControl.sh - If any of Run Control/Resource Manager/Message Viewer are running.
  3. Stop the DAQ Applications
    • Restart System from the System-wide Control on DAQApplicationManager.
    • OR you might want to do Stop System followed by Start System. This is the exact same as Restart System but after Stop System if the system didn't stop decently, you can deal with it instead of having to wait for startSystem before you take further action.) If processes hang abort and retry.

Return to top of page

Problem solving with DCM rebooting (if a DCM is reporting errors)

First, stop Run Control with stopRunControl.sh and any DAQ applications with stopSystem.sh

As a first try, do a dcm reboot. This is NOT the same as power-cycling the dcm's. To do a dcm-reboot:

Right-click on the problem DCM in DAQApplicationManager. A window will pop up - choose "Reboot DCM". A series of windows will pop up asking if you want to continue with the reboot process - click yes until you get the window saying the reboot was successful. At this point the DCM should look red. Once it goes back to pink you know it has rebooted. See DAQApplicationManager for more information.

If for some reason you do not want to do this through the GUI, you can reboot the problem DCM's through the command line as well:

rebooot-dcm-1-1

for dcm 1-1 (aka dcm 6 ) or substitute your desired dcm. Remember to use to position name, not the hardware name (i.e. 1-1 not 06). Validate the state of the system in the Application Manager. Current theory is that if you just have one problem dcm, JUST reboot that one instead of doing them all. So, if possible, reboot one at a time but when everything is just broken madly, then you can reboot them all.

IF you are going to reboot them you, you do:

reboot-dcm-all

Now, after you have done your rebooting through the command line, you should check that the reboot worked and is finished. (This is required ONLY if you are doing this through the command line, not the GUI. The GUI color changing to pink corresponds to passing this test.) There is some lag time here -- your first attempts to check might fail but keep trying for a few minutes. To do this:

check-dcm-all

Or for the individual one:

check-dcm-1-1 etc. for your dcm name.

Once the check dcm works, you should see output like:

root@dcm-06= Mon Apr  4 14:06:43 CDT 2011
 14:06:43 up 0 min,  0 users,  load average: 0.08, 0.02, 0.01

root@dcm-08= Mon Apr  4 14:06:42 CDT 2011
 14:06:42 up 0 min,  0 users,  load average: 0.16, 0.03, 0.01

root@dcm-09= Mon Apr  4 14:06:42 CDT 2011
 14:06:42 up 0 min,  0 users,  load average: 0.08, 0.02, 0.01

root@dcm-11= Mon Apr  4 14:06:42 CDT 2011
 14:06:42 up 0 min,  0 users,  load average: 0.08, 0.02, 0.01

root@dcm-12= Mon Apr  4 14:06:42 CDT 2011
 14:06:42 up 0 min,  0 users,  load average: 0.00, 0.00, 0.00

root@dcm-13= Mon Apr  4 14:06:42 CDT 2011
 14:06:43 up 0 min,  0 users,  load average: 0.08, 0.02, 0.01

After DCM's are in a happy place, continue with starting applications.

Return to top of page

Other things to try

The following notes may apply if you have trouble with startSystem below :
  • repeat steps above if problems persist.
  • Perhaps you want to reboot the dcm's again. Now might be a good time to call the experts.
  • Otherwise, you could also try a dcm power cycle. Details on how to do this are listed here.
  • ALTERNATELY: You may try to start individual Application on the problem machine from the Application Manager by right click on the machine and selecting the start process option.
  • NOTE: A full system start and stop is not needed using this method.

Return to top of page

Starting Applications when none are running

  • Initial State: All hardware resources ready, but no applications running.
  • Final State: All applications (except Resource Manager, Run Control and Message Viewer) running, waiting for initialization.
  1. On the novadaq-ctrl-master terminal that is normally open
    • Start System - *%({color:Red})
      (Or simply, clicking on the "start system" buttton in the DAQApplicationManager. And make sure DDS Daemons are started. )

If an individual process fails start it can be started manually from the Application manager by right clicking on the icon and selecting start process.
The DAQ Application Manager should be happy, with all buttons green (except Run Control, Resource Manager, and Message Viewer).

Return to top of page

Starting Run Control

  • Initial State: All applications (except Resource Manager and Run Control) running, waiting for initialization.
  • Final State: Run Control and Resource Manager started, ready for initialization.
  1. Start Run Control with startRunControl.sh
    • Verify that the applications are running successfully in the Application Manager
    • You should see several GUIs (MsgViewer, Resource Manager, Resource Viewer and RCMainWindow).
  2. In the RC main window :
    1. Click on "Change" and put your name in as the person who started the run.
      • It will remember the last person entered as a default so you should only need to do it once at the beginning of your shift.

Return to top of page

Define and Reserve resources, Select and Prepare Configuration, and Establish Connections for the Partition.

  • Initial State: Applications, Run Control, Resource Manager all running, but resources not assigned to a partition
  • Final State: Partition defined, applications therein connected to each other (and Run Control). Configuration selected and prepared.
  1. There will likely be a Partition 0 tab in the Resource Manager window. It should not be there.
    • Right click on the tab and select release partition
  2. In Run Control click on "Discover Resources"
  3. click on "Select Resources"; another window should pop open.
    • Click all the + icons, to get a full list of resources
    • Select the Buffer Node group, clicking the "Disabled" box for any individual nodes you need to leave out.
    • Select all Managers except SimulationManger and MessageAnalyzer :
      • ConfigurationManager
      • DataLogger
      • GlobalTrigger
      • TDUManager
    • Select all the timing chains you want. Click the "Disabled" box for any DCMs you need to leave out. At this time each dcm has an individual tdu so read carefully when making the selection. All DCM's should be selected unless otherwise stated in the run plan.
      NOTE: DCM 4-1 and 4-2 are no longer part of this list as they have no instrumentation at this time.
    • Click on OK when the selections have been made.
    • If you do not see the resources, you might have to release them from "Partition 0" in the Resource Manager, see DAQ Trouble Shooting Guide for details.
  4. click on "Reserve Resources"
  5. click on "Select Configuration"
    • Click the "+" sign to exand the list, and select the configuration that appears, then click "OK"
      Selecting Configuration in new Run Control
  6. click on "Prepare Configuration"
  7. click on "Load Connections"
  8. click on "Make Connections"

Return to top of page

Configure

  • Initial State: Applications connected within a partition, configurations prepared.
  • Final State: Hardware and applications configured for data-taking
  1. click on "Load Hardware Config."
  2. click on "Configure Hardware"
    • this will take a couple of minutes to step through the DCM's
    • when done, the Load Run Config button will turn green
  3. click on "Load Run Config."
  4. click on "Configure Run"

Cool the APDs

  • Initial State: Hardware and Applications Configured for data taking, but APDs warm.
  • Final State: APDs cooled and ready for Begin Run.
  1. Start Cooling from the CSS GUI DCS Home tab.
  2. Click on the "NDOS Overview" tab. The CSS GUI will hang while it's off talking to the DCMs.
  3. When the GUI view finally switches to the Overview, wait until there are no more FEBs colored in red.

Return to top of page

Start a Run.

  • Initial State: Hardware and Applications Configured for Data taking, APDs cooled and happy, Run not in progress.
  • Final State: Run started and documented, and initial quality checked.
  1. Click the "Begin Run" button when you are ready to start the run !
    • Fill out a "Start Run" form in the ECL log.
    • Before complaining about the number of steps above... this will be greatly reduced in the coming week or two (2011-12-23)
  2. When the run has started, a "sync to current time" command WILL BE AUTOMATICALLY EXECUTED. If not a manual sync can be issued to the Near Detector Master from the TDUControl window.
  3. Now you should be taking data: How to know whether a run is producing (usable) data.

Return to top of page

Return to top of page

Previous Page Main Page What to do while on shift DAQ Trouble Shooting Guide