RC - Shifters' Guide¶
- Table of contents
- RC - Shifters' Guide
- Your job, as the shifter
- Getting help
- Starting the DAQ VNC on evb (Starting on October 14, 2015)
- Starting a run
- Stopping a run...
- Audible DAQ alarm
- Cleaning up the DAQ.
- Viewing the ganglia monitoring page
- Some user notes:
This page contains basic instructions on what to do for normal DAQ operations. If you are having a problem, you can look here to see what you should do, but you should also consider looking at the RC - Troubleshooting page, which contains more details of common problems.
Your job, as the shifter¶
First, THANK YOU! In taking your shift on MicroBooNE, you are the person/people that is taking the data from the detector (and doing a lot more!). That makes you part of the DAQ team---in fact, a very crucial part---so, welcome, and thank you for agreeing to help us.The parts below explain how to do the basic DAQ tasks in more detail, but at a high-level view, your basic responsibilities as a member of the DAQ team are:
- Start and stop the data acquisition processes, and make sure they stay running throughout your shift, according to the Run Plan.
- Watch the DAQ processes to make sure we are taking data smoothly (i.e. that we are taking data, and not automatically starting new runs all the time).
- Note in the e-log the state of the DAQ, any problems/strange things that come up, or any other detector conditions that may (or may not) affect data-taking (e.g. accesses to the LArTF platform!)
- Promptly notify more-experienced members of our DAQ team when you notice a problem.
- Help more-experienced members of the DAQ team solve problems by describing and documenting problems you encounter in the e-log, and maintaining awareness throughout the data-taking so that you can help answer questions those more-experienced members of the DAQ team may have.
- Provide feedback on how we can make the DAQ experience better for future shifters like you.
It's not part of your job description to try to fix problems yourself. We will talk about basic trouble-shooting and recovery techniques you should do when we encounter problems, but beyond those things, it's not your job to try to fix something, and you are not responsible for any downtime. We need you to vigilantly watch, report, and call for help!
Getting help¶There are a number of ways that you can contact experts: the Slack chat, email, carrier pigeon, skype, phone, etc. Luckily/not luckily, the DAQ team is spread across a number of time zones, so it's likely that no matter the time of day, somebody might even be watching and ready to jump in to help. How you get help is not as important as making sure you do, and quickly, especially when we are getting beam data. To make sure you get help quickly though, here's what we recommend doing when you notice a problem:
- Put a quick message on the Slack chat like "Need DAQ expert!" or "I don't know if I'm having a problem, can someone help?"
- Don't wait long for a reply, especially if it's keeping us from taking beam data or you think it may be. Maybe wait just long enough for you to put a quick note into the e-log or take another glance at the trouble-shooting page, but probably no longer than 30 seconds to a minute. Seriously, don't wait before you ...
- Call the on-call DAQ expert. That person is on-call 24/7 and should be reachable at all times. Please so experts can see the Control Room is calling. If you get voicemail, leave a short message saying you are having a problem (or may be having a problem), and note the time.
- If you didn't catch them on the phone, don't wait long for a reply. Like, 5-7 minutes max. Keep trying to restart runs, note in the elog the problem in more detail, send another quick message on the Slack chat, etc. Seriously, don't wait longer than a few minutes before you ...
- Call other experts. Maybe try the on-call expert again, but if no answer immediately go down the DAQ expert list and try to get someone on the phone! If no one is answering, and you're not getting any help anywhere, then call the Run Coordinator, who can advise on other experts to call or help get in contact with DAQ experts. You should also be putting passive-aggressive comments in the elog to publicly shame the DAQ experts.
Starting the DAQ VNC on evb (Starting on October 14, 2015)¶
If it is not open, connect to the vnc session for the DAQ by opening a terminal and doing the following two hops into evb, and then starting up your local vncviewer.
ssh -L 5903:localhost:5903 email@example.com
and on ws01, do
ssh -L 5903:localhost:5903 ubdaq-prod-evb.fnal.gov
Now, (on the control room workstation, not an ssh terminal) start the vnc session:
Use a separate terminal here, not an open ssh window
The password is the common DAQ password xxxxxxxxxxx (concatenate the MicroBooNE DocDB user name, the number two, and the first syllable in the word preceding "xxxxxxxxxxx" in this sentence). This should open you up to a desktop on ubdaq-prod-evb as uboonedaq. The evb VNC server is serving out connections to vncclients.
(If you see a complaint about 5903 not being forwardable on one of those two hops above try swapping ws01 -> ws02. Once when I tried this with ws01 some crufty ssh was hogging 5903 on ws01. evb let me ssh in, but complained:
channel_setup_fwd_listener: cannot listen to port: 5903
Could not request local forwarding.
My client could not connect. I then got in through ws02 with no complaints and my VNC client happily connected. So, when you're done dropping in on the VNC session, please then exit your ssh's! - EC)
The outdated VNC instruction is kept in the Old Run Control VNC Settings page. It's for backup -- we are not using it anymore.
Starting a run¶
In the vnc session (see above item), open an xterm by right clicking and choosing "xterm." Then set up the environment by
You should see a bunch of setup things reported. You should see in blue text the DAQ version number. If a DAQ expert wants you to change versions, all you should need to do is quit the xterm, open another, and set up the environment again to get the desired one.
Do "runConsoleDAQ.py" on terminal:
You will then be prompted to enter how long you would like this run to be. You cannot put a value in greater than 420 minutes. The reason for this is that we only have 24 bits for the frame number, which allows only slightly more than 420 minutes per run to have a unique frame number. Consult the white board, elog, run plan, or run coordinator on what should be entered here (but for normal data-taking, 420 minutes is good). You can exit the console daq script here if you need to by typing 'exit'.
After that, you will be prompted to pick a configuration ID. Consult the white board, elog, run plan, or Run Coordinator for the config ID we want to run. There are semi-descriptive names that go along with the ID, but you will need to enter the ID number on the command line.
Starting a run sequence¶
There is an option to run a "sequence" of runs, right after each other. Run sequence files are typically kept in /home/uboonedaq/DAQ_DRIVER_FILES. Each driver file contains four columns: the config ID to run, how long to run it (in minutes), how many times to run it, and what config ID to run if it fails. Examples are in that folder.
To run a run sequence, you just specify the driver file name in an argument to runConsoleDAQ:
runConsoleDAQ.py -s DAQ_DRIVER_FILES/my_driver_file.driver
Starting an expert run¶
Note, if you are doing a special configuration (and someone should tell you when we are doing so), you will need to run this in "expert" mode. How? With the expert option of course!
And, if it's a pmt only run, it only make sense to do...
runConsoleDAQ.py expert pmtonly
"Expert" mode just allows additional configurations, but has no auto-restart, so you will need to be more vigilant. PMTONLY mode does not read out TPC data, so will only start one seb process (that on seb10).
Stopping a run...¶
...while taking data¶
As the runConsoleDAQ should tell you, to stop a run while we are taking data, just type stop in the runConsoleDAQ window. You will be prompted for a reason (that will be put into the elog), and you will be brought back to the start to begin a new run. The next run will not start automatically, so you will need to input a time and configuration again.
...at any other time (like during configuration)¶
If you are not in the middle of a run, you can "CTRL-C" in the runConsoleDAQ window, and it should kill all the processes cleanly. You will likely be prompted for a reason for the stop/cancel of the run, and that will automatically be put in the elog. After this, runConsoleDAQ will likely be killed, and you will need to run it again.
Audible DAQ alarm¶
We have an audible DAQ alarm that is triggered when a run is stopped and won't automatically restart (as happens when the run control encounters an unforeseen failure mode). This indicates that shifter intervention is necessary to begin taking data again. If you have not been informed about the default run plan (either by reading the Run Plan page if it is up-to-date or being told by the Run Coordinator or DAQ expert), please contact the DAQ expert or Run Coordinator.
You can shut up Verdi (a.k.a. the audible DAQ alarm) by keyboard interrupt.
In the case that you stop a run, you will hear the audible DAQ alarm.
Cleaning up the DAQ.¶
There may be occasions where we have tried to stop a run, but for some reasons all processes have not been cleanly killed. For instance: you have exited out of runConsoleDAQ, but for some reason there are some blue xterm windows (from the assembler or seb) still displaying. To clean up, do:
Viewing the ganglia monitoring page¶
On one of the control machines, type the following command:
ssh -D 8080 uboonedaq@ubdaq-prod-ws01
(you may need to renew your kerberos ticket). Then, on that same control-room machine (not ubdaq-prod-ws01), you should be able to navigate to the ganglia shifter-view web page.
- Shifter graphs: http://ubdaq-prod-evb.fnal.gov:8080/gweb/?r=hour&cs=&ce=&tab=v&vn=shifter
- Auto-rotating display: http://ubdaq-prod-evb.fnal.gov:8080/gweb/autorotation.php?view_name=shifter&id=1&timeout=20
The username is uboone, and the password is the common DAQ password xxxxxxxxxxx (concatenate the MicroBooNE DocDB user name, the number two, and the first syllable in the word preceding "xxxxxxxxxxx" in this sentence).
If you are having trouble, you may need to update the proxy settings on your browser. See How_to_make_ssh_tunnels_to_ubdaq_internal_webservers
Some user notes:¶
On a Mac, there's a free VNC viewer. Hit command-K in the finder, and type vnc://localhost:5903 to connect.