Project

General

Profile

RC - Old Guide

Running the Console DAQ

(-1) If you get a "Permission denied" error on the green console screen. Try terminating the run, renewing your Kerberos ticket, and restarting the run.

Here's the new instructions for shifters to start a run:

(0) Check the Slowmon to make sure there are no low voltage trips (to do that go here)

(1) If it is not open, connect to the vnc session for the DAQ by opening a terminal and doing:

ssh -L 5908:localhost:5905 uboonedaq@ubdaq-prod-ws01.fnal.gov

(you may need to refresh the kerberos before doing this)

(2) Now, start the vnc session:

vncviewer localhost:5908

The password is "uboone2pass". This should open you up to a desktop on ubdaq-prod-ws01 as uboonedaq

(3) Open a terminal, and log in to ubdaq-prod-evb.fnal.gov (still as uboonedaq):

ssh evb

You should see a bunch of setup things reported. Note in blue text the DAQ version number. If a DAQ expert wants you to change versions, all you should need to do is log out of evb, and log back in to get the desired one.

(3) Do "runConsoleDAQ.py" on terminal:

runConsoleDAQ.py

Note, if you are doing a special configuration, you will need to run this in "expert" mode. How? With the expert option of course!
runConsoleDAQ.py expert

And, if it's a pmt only run, it only make sense to do...
runConsoleDAQ.py expert pmtonly

(4) You will be prompted to enter how long you would like this run to be. You cannot put a value in greater than 420 minutes. You can exit the console daq script here by typing 'exit'.

(5) You will be prompted to pick a configuration ID. Consult the white board, elog, or Run Coordinator for the config ID we want to run. There are semi-descriptive names that go along with the ID, but you will need to enter the ID number on the command line. Don't worry: you have infinite time to do so.

(6) You should see a black window pop up to do "hardware" configuration: ASICs, pulsers, etc. This may be very fast (if there's not much to be done), or it may be painfully slow (if we are configuring the ASICs). You should see it progress slowly but surely through all of the feedthroughs. BUT, if it's taking more than 20-30 minutes, there is a likely a problem. Check to see if the ASIC LV has power for all feedthroughs: note if it does or doesn't in the elog. Then, you should call the DAQ or Cold Electronics expert again and let them know, and they will guide you through the process.

(7) You should see windows pop up. The big lighter blue one is the assembler process. The smaller, darker blue ones are the sebs. Two black ones for the dispatcher and OM. You should also see a start run message in your original terminal. A copy of this message will be auto-entered into the elog --- YOU DON'T HAVE TO DO IT YOURSELF ANYMORE. Also, you should see the buffalo!

It's recommended to be watching the runConsoleDAQ.py script. That should tell you what's going on!

(8) You should see the configuration of the FEMs start now, with messages posting on their progress. If things just stay still for more than 30 seconds, something is wrong. Stop the run ([ENTER] in original terminal, below the bison) and try again.

(9) The control will report when the run has really started, and you should see messages flying by on the SEBs and Assembler terminals. Watch the control terminal for any errors. Sometimes the error will cause the run to end early---it should post an error message with the end of run message in the elog when it does so. Still, you can see if there's any clues from the log on what happened, and post them in the comments of that auto-generated elog entry.

Every 30 seconds, the control will check to make sure the processes are all running, and report on the number of subrun files written to disk. The control process should be checking to make sure that number is updating, but if it isn't, you can catch it sooner! Note we typically write 10 events per subrun right now, so expect 1 file written every minute and a half at a 100 mHz rate. If it's taking longer than that to see this increase, we likely have a problem in one of the seb processes. You can stop the run (hit [ENTER] on bison terminal) and it should stop the run and bring you back to step (4). Please not in the elog that you stopped the run, and say why!

(10) When the run is complete (i.e. has hit it's maximum time, or stops itself, or you stop it), all of the xterm windows will be killed, and you will start at step (4) again, where it asks you how long to run for. In most cases where you do not stop the run, it will automatically start a new run with the same time and config as the previous run. If you don't want to do that, you should stop the run yourself, and it will allow you to pick a new config. When you run in expert mode, it will always prompt you for the next ru.

(11) REMEMBER TO POST IN THE ELOG! Make sure the start DAQ message and end DAQ message get posted there automatically, and post the number of noisy/quiet channels (instructions to follow), Slow Control Alarms and any other whimsy that might strike you while on shift. Please read this in order to learn what alarms to look for in Slow Control and how.

Getting the list of noisy channels.

You can extract a list of dead or noisy channels from data that you take. We'll make this into pretty plots in the near future, but right now the list of channels is easy to make. You should do the following on ubdaq-prod-evb as uboonedaq.

In most cases, you can use a nice little script:

./lazy_noise

This will find the most recent run, analyze the data, and tell you how many bad channels you have. Make sure the run number matches!

You can run this command as soon as the run has moved to subrun 3. (It doesn't hurt to wait though.)

Post the number this returns as a comment to the DAQ start message, if the number is significantly larger than the previous runs contact an expert. Check the ASIC LV in slowmon, as often this may mean the power to the ASICs tripped, and they are not reading out properly.

But, if you want to do it the non-lazy way....

  • First, include all the noise data files you'd like to analyze into a text file with one file (full path) per line. If you were wanting to analyze run 543, you could do this to make that text file:
    ls -r /data/uboonedaq/rawdata/NoiseRun*-0000543-*.ubdaq > ~uboonedaq/file_list_run543.txt
    

*now we want to pare back the number of files we look at so:

emacs -nw ~uboonedaq/file_list_run543.txt

or
vi ~uboonedaq/file_list_run543.txt

(if you love 'vi' and hate life)

and remove all runs except for *-00001.ubdaq and *-00000.ubdaq.

  • And so now, you can run and make the noise list by doing the following
    noise_check ~uboonedaq/file_list_run543.txt noise_check_run543
    

    There is a final argument, which is optional, and defaults to 10.

To determine the number of noisy/quiet channels do:

wc -l noise_check_run543_detail.txt

Post the number this returns as a comment to the DAQ start message, if the number is significantly larger than the previous runs contact an expert. Check the ASIC LV in slowmon, as often this may mean the power to the ASICs tripped, and they are not reading out properly.

###Old Instructions###

We are finishing our work on the run control GUI, but in the meantime we have an automatic script that launches the DAQ processes and steps through the configuration and starting of a run.

To run it, do the following:
  • Log into ubdaq-prod-ws01 as uboonedaq:
    ssh uboonedaq@ubdaq-prod-ws01
    
  • From there, log into the evb machine:
    ssh evb
    
  • Start the console DAQ by doing:
    runConsoleDAQ.sh
    

    You should see blue windows pop up. The big lighter blue one is the assembler process. The smaller, darker blue ones are the sebs.
  • When prompted (on your original terminal, right below the bison), hit "y" or "n" to decide whether to configure the ASICs.
  • After ASIC config (if you do it), three more windows will pop up: two (black and orange) are for the online monitoring executables; the olive green with wheat text is the "Control" console. You should look there and see how the run progresses.
  • If things just stay still for more than 30 seconds, something is wrong. Kill ([ENTER] in original terminal, below the bison) and try again.
  • You should see data start, and the control console tell you "Running!" within a few minutes. If not, kill and try again.
  • The run should automatically die after 30 minutes. There's a countdown in the control console! If it doesn't, you should feel free to kill it.
  • REMEMBER TO POST IN THE ELOG!!!!!!!!!!!!!!! Or Wes will yell at YOU!!!!!!
  • To check that data is actually being written out
    ls -lrth /data/uboonedaq/rawdata
    

    check that the run that you just started is at the end of the file list. <color=red> WITH GREAT POWER COMES GREAT RESPONSIBILITY!!!!!!!!!!!! Do not touch these files!</color>
Here are Matt Toups's edits for the shifters to find when they follow the link here:
1. There will be a bunch of errors on the seb terminals that you should just ignore. They are:
a. [ Bad XMIT Status ]
b. [ Bad FEM Status ] repeated up to 15 times per event
c. [CRITICAL] SuperNova data drain incomplete (block ID unchanged in FEM buffer)
d. [CRITICAL] Data drain failure (header and ADC data not drained simultaneously)

If for example you see less than 9 windows on the right hand side of the screen, this means that one or more of the sebs died. You can check if the node is still up by pinging that seb. If any of the nodes die, then you should terminate the run by hitting enter in the terminal window.

Also, do not be alarmed if the file sizes are drastically different. For example, I just took run 215 and the first subrun was 1.4 GB large. The second file was 410 MB big. The third file was 902 MB big. The final file was 305 MB big though I hit enter to terminate the run.

Getting the list of noisy channels.

You can extract a list of dead or noisy channels from data that you take. We'll make this into pretty plots in the near future, but right now the list of channels is easy to make. You should do the following on ubdaq-prod-evb as uboonedaq.
  • First, include all the noise data files you'd like to analyze into a text file with one file (full path) per line. If you were wanting to analyze run 543, you could do this to make that text file:
    ls -r /data/uboonedaq/rawdata/NoiseRun*-0000543-*.ubdaq > ~uboonedaq/file_list_run543.txt
    
  • And so now, you can run and make the noise list by doing the following
    noise_check ~uboonedaq/file_list_run543.txt noise_check_run543_ 
    

    There is a final argument, which is optional, and defaults to 10.

To determine the number of noisy/quiet channels do:

wc -l noise_check_run543_detail.txt

Then copy the resulting number as a comment to "DAQ RUN START" ELOG entry.

The most interesting output file will be written to that directory as noise_check_run543_detail.txt. It lists the channel (crate,card,ch), it's average pedestal, and average noise value. Only channels that are very quiet or very noisy are listed. You can post this list to the elog, and we can use it to compare data during the cool down.

What's below this point is for running the run control GUI. We're almost ready to use it! But not quite. So you can ignore it for now.

noise_check /path/to/file/list.txt noise_check_output_header_ max_events_to_process

For example, here's what I would do typically:

Instructions for running out of your own area (including uboonedaq user's pulled down code).

  1. Make sure you have run control in your "development" directory, and the readout code in your "development_daq" directory. Instructions for getting the code:
  2. Log in to ubdaq-prod-evb.fnal.gov (via ubdaq-prod-ws01 or another gateway).
  3. Run the usual environment setup (as uboonedaq, this is setup_rc.sh in the home area).
  4. Do:
    source ~/development/uboonedaq-rc/projectsrc/configs/development_setup_RC_env.sh
    
  5. Then, launch the DAQ Application Manager and Resource manager by doing
    source ~/development/uboonedaq-rc/projectsrc/configs/start_RC_system.sh
    

    You should have those two windows pop up.
  6. You can ignore the resource manager, but go to the DAQApplicationManager. You should see two tabs at the top: DAQ Processes, and DDS Daemons. In each tab, you can double-click on the process groups (the colored boxes) to see individual processes. I recommend doing that.
    • Go to the DDS Daemons tab, and hit "Restart DDS" in the bottom right corner. Say "Yes" and you should see everything restart (turn green). ** Now go back to the "DAQ Processes" tab and hit "Restart System" in the bottom right corner. You should see everything except rcServer turn green. If something didn't start, you can right click on it and say "StartProcess".
  7. Go back to the terminal, and do startRunControl.sh. You should see the Run Control window pop up. You should now see rcServer turn green in DAQAppMngr.
  8. On Run Control, hit Discover Resources, and then Select Resources. For these runs, we want everything, so hit Select All at the bottom and then say OK to close the pop-up, and then hit Reserve Resources on RC.
  9. Now, hit Select Configuration, and on the pop-up fine external_trigger_test_LArTF.fcl (double-click to open the sub-menu, and then select the file of the same name, and say OK.
  10. Before proceeding, you should start a hacky status monitor:
    • Open a new terminal, and log in to ubdaq-prod-evb via a gateway node. ** Do source setup.sh to setup environment. ** Run
      ./checkstatus.sh
      in the home area of uboonedaq. ** You should now see the assembler and all SEBs reporting Idle. Leave this window open and watch the status.
  11. Now, back on Run Control, hit Prepare Configuration and Load Configuration, and then Configure. Now watch the status, and you should see everything "Configuring" (though the assembler quickly moves to "Connecting".
  12. Wait until Assembler reports "Connecting" and ALL SEBs report "Awaiting Connection", and then Make Connections. Again, watch the status, and you should see everything eventually say "Awaiting Run"
  13. Before beginning the run, you need to configure the pulser (for the external trigger) and ASICs.
    • Make sure all the ASIC LV are powered up (Slowmoncon_Wiener). ** Open a new terminal and log in to ubdaq-prod-seb10 via a gateway node. ** Do source setup.sh; unset LD_LIBRARY_PATH. ** Set the pulser by doing
      ./integtest/pulserConfig.sh 0.1
      where 0.1 is the rate in Hz for the trigger (for now, do 0.1). ** Config each ASIC FT and P:
      configASIC_FT1_P0 -g 0 -p 0
      which configures FT1, P0. Each FT has a P0 and P1, and there are 11 FT. -g specifies the gain setting, and -p the shaping time. If you use a script/background, make sure you see the following result
      =======>configASICFTSDO: ASIC configuration with SDO starts...
      =======>configASICFTSDO: Enable CS is done
      =======>configASICFTSDO: Write SDI is done
      =======>configASICFTSDO: Disable CS is done
      =======>configASICFTSDO: ASIC configuration with SDO is done
      =======>configASICFTSDO: ASIC configuration done
      ConfigASIC4FTMB3ACB2: ASICs configuration is successful, SDO is verified OK
      and that you don't see any bit errors (currently seeing bit errors for FT11, P1).
  14. Now ... Begin Run. Say it's a "pedestal" run (it doesn't matter), and make sure that you see the status on the processes change to "Processing Fragments".
  15. Check and make sure we are writing to disk, by doing an ls -lrt /data/uboonedaq/NoiseTests. You should see a new file, being updated about once every 10 s.
  16. When ready to stop, you can say End Run, but there is a bug. Best to stop all processes in DAQ Application Manager, and for good measure stop run control by doing stopRunControl.sh. You can then go back to step 6 above for a new run.

If you get in trouble...

Don't panic! It's probably not your fault. If something crashes, try to restart it. If things seem not to be working, usually it's a very good idea to start fresh. You can kill run control by doing

stopRunControl.sh

and you can stop the other DAQ processes by doing
source ~/development/uboonedaq-rc/projectsrc/configs/stop_RC_system.sh

Then start over, and see if you have better luck.

Starting Ganglia

In a terminal, do:

ssh -D 8881 uboonedaq@ubdaq-prod-ws01.fnal.gov

And then, in a firefox browser go to http://ubdaq-prod-evb.fnal.gov:8080/gweb

Alternative to RC

We've been having occasional trouble with RC. Here are some instructions for running using a Gennadiy-written script:

h1 NOTE GENNADIY FORBIDS ME TO EDIT THIS

However, you can visit https://cdcvs.fnal.gov/redmine/projects/uboonedaq/wiki/Take-data-production-script for updated instructions.

$ ssh -Y uboonedaq@ubdaq-prod-evb
$ cd integtest
$ ./runConsoleDAQ.sh

Once all the sebs are up and in idle states, hit enter.
Output:
/data/uboonedaq/rawdata

To stop a run, go to the console where the runConsoleDAQ.sh is launched, and hit enter.

Instructions on how to build/run RunControl from e2 code