Project

General

Profile

RC - Oh, ya got trouble...

This is the DAQ troubleshooting page for shifters. If you encounter an error and you don't know what to do, the first thing you should do is search this page for keywords from the alarm or problem you are having (ctrl+F on windows or cmd+F on Mac to search). Or you can use the table of contents here to find the section you need easily.


General advice

In general, if there is a problem with the DAQ that is preventing us from taking data, or taking data smoothly, you should
  1. Stop the current run (if applicable) and attempt to start a new run. See if the problem recurs.
  2. If the same problem comes up again, or tons of new problems seem to be occurring, call an expert. While you wait for response, continue to try to start a new run.

Basically: unless an expert tells you otherwise, it does not hurt to try to start a new run, and you should repeatedly attempt to do so.



Run Control/Console DAQ


"When I try to start a run, I see a bunch of 'permission denied' errors on the xterm windows that pop up."

  • Make sure you are the uboonedaq user
    whoami
    

If not, please close the vnc and all the other terminals on this control-room machine, and follow the instructions for launching the VNC again.

  • Make sure you are on ubdaq-prod-evb.fnal.gov:
    hostname
    

If not, and you are not on ubdaq-prod-ws01, then enter "exit" until you are on ubdaq-prod-ws01 (if you accidentally close the terminal, that's fine: open a new one by clicking the terminal icon on the top panel). Then, follow the instructions for starting a run.

If those two things are OK, then we may have a kerberos problem. Call a DAQ expert pronto!


"The DAQ keeps failing during the configuration, with complaints about not seeing enough nodes, or with SEB (small dark blue windows) processes crashing."

It could be that there are some rogue DAQ processes living. We can deal with that!
  1. Kill the runConsoleDAQ by doing "CTRL-C" in the runConsoleDAQ window.
  2. Issue the DAQ cleanup command: terminate-online-daq-prod in the terminal that's logged into ubdaq-prod-evb (where you run runConsoleDAQ).
  3. Restart runConsoleDAQ, and try to take another run.
  4. If there are still problems at this point, contact a DAQ expert.

"During configuration, the SEB windows (small dark blue ones) come up all ok, but the assembler window (large lighter blue one) and the online monitor windows (black and orange ones) crash."

This can happen if you try to run the runConsoleDAQ.py command from the wrong machine (e.g. from ubdaq-prod-ws01). You should make sure you log into ubdaq-prod-evb on a terminal inside the vnc:

ssh ubdaq-prod-evb

and then run runConsoleDAQ.py from that machine.


"Runs keep dying due to "Missing Node" failures."

Yeah, we've had this happen with PMT+TPC with the PMT cosmic discriminator readouts enabled and running at a high rate. It's a known issue. BUT, you should still take it seriously:
  • The run should be restarting on its own. If it does not, call an expert.
  • If you get more than 3 run restarts in an hour, email an expert. If it's 4-5, call an expert.
  • If you get 3 run restarts right after the other, with only a few minutes of run time for each run before it hit this error, call an expert.

"I started the configuration, but none of the blue windows came up. Instead, it's just this black one with green text that is taking a long time to do something."

That's the hardware-config terminal, which is run before any of the other processes get launched. If it's taking a long time, that means that it is configuring the ASICs.
  • If you didn't want to configure the ASICs, then you may have entered the wrong config number, or there is an error.
  • If the run config ID is the correct one, but it's not supposed to be configuring the ASICs (usually name includes ConfigASICs if it should), then call a DAQ expert immediately.
  • If you did enter the wrong configuration, that's ok. Note it in the elog, and make sure it's not an expert configuration you've tried to run. You can do that by opening a new terminal in the VNC and doing
    list_main_cfg
    

    And compare the name/number of the configuration you specified with that list.
You should see printouts telling you it's on "FT3_P0" or something like that. There are 11 "FT" (feedthroughs) total to configure, with two ports (P0 and P1) each. It should configure them in order. If it is taking longer than 20 minutes, and/or seems to be repeating one of the FT/Port combinations over and over, we have a problem.
  • Check slowmon, and see if there are any low-voltage trips for the ASIC low-voltage power.
  • If there are, call a TPC or DAQ expert. If there aren't, call a DAQ expert.

When it finishes, that window should disappear, and the rest of the processes should start. On automatic run restarts now, you should not see it configure again.


"This weird 'Authenticatication is required' window keeps popping up on the VNC. What should I do about it?"

It is weird, I totally agree. It looks something like this:

The window should go away on its own in a few minutes. You can also just close out the window (click the "X" in the upper right). Don't worry!



Slow Monitoring


"I'm getting a DAQ alarm in SlowMon. What should I do?"

By default, you should do the following:
  • If it's a minor alarm (yellow), note it in the elog and email the experts at . You can simply link the elog entry in that email (if you've given enough detail in the elog, which you should do).
  • It it's a major alarm (red), call the expert.

However, if it's one of the errors explained below, you can follow the instructions here, but if you're not sure/things don't seem the same as what's described below, go back to the default above (i.e. call expert).


"Slowmon has a bunch of DAQ Status, SEB complaints, with something about Circular Buffer Occupancy."

We've hit an error in the data format on one or more of the SEBs. This has happened with higher frequency for the TPC SEBs (1-9) when there is someone on the platform. Here's what you should do:
  1. Stop the run, and start a new run.
  2. If the alarm is still there, you can acknowledge it (but I think it should be cleared).
  3. Check the logbook to see if there is anyone that you know about on the platform. Check the webcam too if you can.
  4. Make a note in the elog.

If this repeatedly occurs, and there is no one on the platform as far as you know, call a DAQ expert to let him or her know. If there is activity on the platform, no need to call an expert, but note the failures in the elog still, and continue to restart runs.


"I'm getting a MAJOR alarm on uB_DAQStatus_DAQX,sebXX/xmit_frame_ctr_diff_calc and/or sebXX/xmit_trigger_ctr_diff_calc. What should I do?"

Please try to restart a run, and make an elog entry ("Electronics" category) cc'ing both the DAQ and readout experts. If the error comes back after a new run is restarted (typically 10 minutes after a run starts), call the warm readout on-call expert. They will likely need to power-cycle the crate causing the alarm.

If the error continues appearing and an expert cannot be reached let the run continue. Try calling the RO expert periodically. You are not allowed to stop-and-restart a run more then once for this error unless with explicit permission from a RO/DAQ expert or RunCo. The data we take with this error is good quality, so continuing to take data is the best solution.


"I'm getting a MAJOR alarm on uB_DAQStatus_DAQX/sn_read_lag_multiplicity. What should I do?"

Please, wait 1 minute to make sure that the run is not ending or crashing (as this alarm will be shown in those cases). If it remains, please try to restart a run, and make an elog entry cc'ing both the DAQ and readout experts.

If the alarm comes back after a new run restarted, please call the readout expert.


Online Monitoring


"If the online monitor Lizard stops updating, what should I do?"

Please see the instruction in OM_-_Troubleshooting.