Project

General

Profile

Trouble shooting for OnMon/EVD on-call experts

This page is intended as a general trouble shooting guide for the OnMon/EVD on-call expert. It is written so that it can also be easily used by non-experts such as shifters (see the last section here.)

General Tips

  • If the shifter is having trouble starting OnMon/EVD for one partition of the FarDet and either OnMon or the EVD is already running for a different partition, then this implies strongly that nothing is wrong with OnMon/EVD and that the shifter is not doing something correctly to start them up. One of the first questions I always ask the shifters when they call me to say that they can't start OnMon or the EVD is to ask if it is already running for any other partitions. If the answer is yes, then simply walk them through step by step starting things up.
  • Unfortunately, when OnMon or the EVD start up in the control room, a terminal window pops up that goes away quickly if the program crashes. If an error message is being printed to the screen, this makes it difficult for the shifter to tell exactly what that error message is. As the expert, you will have to log into the appropriate datamon machine and run things manually to get this kind of information.

A lot of expert calls do not turn out to be issues. Before debugging please double check the following

  1. Have the kerberos tickets been renewed on CR02?
  2. Has the shifter opened onmon for the right partition & datadisk.
  3. Are both onmon and the evd having issues? (Events missing on both might point to a problem with the datataking or the event dispatcher)
  4. As the shifter double checks this you can check the running processes on the datamon machine.

Trouble Shooting Problems and Solutions

Easy Fixes: (Shifters can try these solutions.)

Problem Symptom Possible Solution Extra Info for Experts Only
OnMon/EVD won't start double-clicking desktop icon and selecting partition (and datadisk for the FarDet) pops up a terminal window that quickly goes away without anything else happening The kerberos ticket could be expired. Renew it by clicking on the little key button.
OnMon/EVD won't start double-clicking desktop icon and selecting partition (and datadisk for the FarDet) pops up a terminal window that quickly goes away without anything happening You've selected the incorrect partition (or datadisk for the FarDet) in the pop-up dialog boxes. When running at the command prompt, you will get an error from NOvASocketInputDriver that says "connection refused" meaning that there is no dispatcher processes running on the disk you are trying to connect to for that partition.
OnMon/EVD won't start double-clicking the desktop icon does nothing (no terminals or dialog boxes appear) It could be the case that the dialog box asking you about which partition you want to use popped up in a place that makes it hard to see. Look around the desktop and move other windows around if you have to.
OnMon/EVD won't start double-clicking the desktop icon does nothing (no terminals or dialog boxes appear) You could be using the wrong desktop icon. There should only be one appropriate for what you are trying to do, but there are some old ones lying around that don't work anymore that may have found there way back to the desktop by accident. Hunt around to find the appropriate desktop icon and use that one. If you are an expert and this is the case, you should get rid of these old desktop icons and the scripts that they point to using cvs remove. This includes things on novadaq-*-master in /home/novacr02/Desktop and /home/novacr02/DAQ-Desktop-Utilities and things on novadaq-*-datamon in /home/novadaq/bin.
The OnMon Viewer won't start Opening up the OnMon viewer (not in DSO results mode) brings up a blank viewer with no histograms. This is what happens if you try to start the viewer when the producer is not running (or you could have started the viewer for the wrong partition.) You must start the producer first from the appropriate desktop icon. Once the producer is running (you'll see "event contains XXXX bytes" scrolling past in the terminal window) then you can start the viewer. In this case, in the terminal window where the viewer is running you will see a repeating message "Looking for shared memory segment..." meaning that there is no shared memory segment to connect to. This could also occur if the shared memory segment was somehow deleted (See the hard fixes for help in recovering from that.)
Plots in the OnMon viewer periodically go blank and then fill up again No action is needed. The plots reset every 30 minutes.

Hard Fixes: (Only experts should attempt these solutions.)

Problem Symptom Action for Experts
All OnMon and EVD windows suddenly go away... ..and can't be restarted. This implies strongly that something is majorly wrong with the datamon machine. Try ssh-ing into it and/or ping-ing it. If you can't reach the machine, then contact the DAQ experts by phone. Hopefully the machine only needs to be power cycled.
Can't start the viewer when the producer is successfully running. The viewer displays something like "Failed to get shared memory segment." This means that you need to do a little shared memory clean up. Try first using the desktop kill script to stop all of OnMon (which should clean up the shared memory.) If the shared memory segment persists, then you will have to clean it up manually on the datamon machine (see cleaning up shared memory in the general overview section of the OnMon wiki.