Project

General

Profile

RC - NewTrouble


General advice

In general, if there is a problem with the DAQ that is preventing us from taking data, or taking data smoothly, you should
  1. Attempt to start a new run. See if the problem recurs.
  2. If the same problem comes up again, or tons of new problems seem to be occurring, call an expert. While you wait for response, continue to try to start a new run.

Basically: unless an expert tells you otherwise, it does not hurt to try to start a new run, and you should repeatedly attempt to do so.



"When I try to start a run, I see a bunch of 'permission denied' errors on the xterm windows that pop up."

Likely the kerberos ticket on ubdaq-prod-ws01 (the machine the vnc is on) has expired. Do

kdestroy

And then
kinit USERNAME

where USERNAME is your username. Make sure that you are still logged in to ubdaq-prod-ws01 as the user uboonedaq (a quick whoami command should tell you that).


"Slowmon has a bunch of DAQ Status, SEB complaints, with something about Circular Buffer Occupancy."

We've hit an error in the data format on one or more of the SEBs. This has happened with higher frequency for the TPC SEBs (1-9) when there is someone on the platform. Here's what you should do:
  1. Stop the run, and start a new run.
  2. If the alarm is still there, you can acknowledge it (but I think it should be cleared).
  3. Check the logbook to see if there is anyone that you know about on the platform. Check the webcam too if you can.
  4. Make a note in the elog.

If this repeatedly occurs, and there is no one on the platform as far as you know, call a DAQ expert to let him or her know. If there is activity on the platform, no need to call an expert, but note the failures in the elog still, and continue to restart runs.


"The DAQ keeps failing during the configuration, with complaints about not seeing enough nodes, or with SEB (small dark blue windows) processes crashing."

It could be that there are some rogue DAQ processes living. We can deal with that!
  1. Kill the runConsoleDAQ by doing "CTRL-C" in the runConsoleDAQ window.
  2. Issue the DAQ cleanup command: terminate-online-daq-prod in the terminal that's logged into ubdaq-prod-evb (where you run runConsoleDAQ).
  3. Restart runConsoleDAQ, and try to take another run.
  4. If there are still problems at this point, contact a DAQ expert.

"During configuration, the SEB windows (small dark blue ones) come up all ok, but the assembler window (large lighter blue one) and the online monitor windows (black and orange ones) crash."

This can happen if you try to run the runConsoleDAQ.py command from the wrong machine (e.g. from ubdaq-prod-ws01). You should make sure you log into ubdaq-prod-evb on a terminal inside the vnc:

ssh ubdaq-prod-evb

and then run runConsoleDAQ.py from that machine.


"Runs keep dying due to "Missing Node" failures."

Yeah, we've had this happen with PMT+TPC with the PMT cosmic discriminator readouts enabled and running at a high rate. It's a known issue. BUT, you should still take it seriously:
  • The run should be restarting on its own. If it does not, call an expert.
  • If you get more than 3 run restarts in an hour, email an expert. If it's 4-5, call an expert.
  • If you get 3 run restarts right after the other, with only a few minutes of run time for each run before it hit this error, call an expert.

"I started the configuration, but none of the blue windows came up. Instead, it's just this black one with green text that is taking a long time to do something."

That's the hardware-config terminal, which is run before any of the other processes get launched. If it's taking a long time, that means that it is configuring the ASICs.
  • If you didn't want to configure the ASICs, then you may have entered the wrong config number, or there is an error.
  • If the run config ID is the correct one, but it's not supposed to be configuring the ASICs (usually name includes ConfigASICs if it should), then call a DAQ expert immediately.
  • If you did enter the wrong configuration, that's ok. Note it in the elog, and make sure it's not an expert configuration you've tried to run. You can do that by opening a new terminal in the VNC and doing
    list_main_cfg
    

    And compare the name/number of the configuration you specified with that list.
You should see printouts telling you it's on "FT3_P0" or something like that. There are 11 "FT" (feedthroughs) total to configure, with two ports (P0 and P1) each. It should configure them in order. If it is taking longer than 20 minutes, and/or seems to be repeating one of the FT/Port combinations over and over, we have a problem.
  • Check slowmon, and see if there are any low-voltage trips for the ASIC low-voltage power.
  • If there are, call a TPC or DAQ expert. If there aren't, call a DAQ expert.

When it finishes, that window should disappear, and the rest of the processes should start. On automatic run restarts now, you should not see it configure again.