Project

General

Profile

SLC - Troubleshooting » History » Version 28

Version 27 (Sowjanya Gollapinni, 11/05/2019 04:17 PM) → Version 28/29 (Sowjanya Gollapinni, 11/05/2019 04:18 PM)

h1. SLC - Troubleshooting

{{>toc}}

This page lists what issues can occur and how they should addressed.
Also listed are what problems need help from an expert.

h2. Problems with starting up the GUI

One of the following might happen when you follow the instructions for [[SLC_-_Overview#How-to-connect-to-the-shared-control-screen|How to connect to the shared control screen]]:

* *Restarting everything after a reboot or power outage:* see [[SMC_startup]].

* *You cannot ssh to ubdaq-prod-ws01 at all.* This is network problem, or ws01 is down. Contact an expert.

* *ssh works, but vncviewer prints an error message indicating no response from the vncserver.*
** Make sure you there was no error message from ssh indicating failure to tunnel port 5902. Sometimes this message scrolls off the screen. If tunneling fails, there may be a defunct ssh process that needs to be killed. You can find such processes using @ps ax | grep 'ssh.*-L *5902:'@.
** It is possible the vncserver is not running on ubdaq-prod-ws01. You can simply try restarting the vncserver by executing @~/startVNC.sh@ as the ubooneshift user on ws01. If this fails, check for a dead vncserver process using the @ps@ and @grep@ commands on ubdaq-prod-ws01, like this: @ps auxw|grep 'Xvnc.*:2'@.
** If running the startVNC.sh command says something to the effect of @"Warning: ubdaq-prod-ws01.fnal.gov:2 is taken because of /tmp/.X11-unix/X2
Remove this file if there is no X server ubdaq-prod-ws01.fnal.gov:2"@ AND there is no Xvnc process (as established in the previous bullet) then do as the warning suggests and remove the temporary file and try to start the VNC server again.
** See also [[SMC_startup]].

* *vncviewer starts, but you see a "screen is locked" prompt asking for the password for the ubooneshift account.* This shouldn't happen any more, but if it does, then someone needs to disable the screensaver, because ubooneshift is a shared service account and doesn't (shouldn't) have a password. In any terminal logged in to ubdaq-prod-ws01, do
<pre>
ksu ubooneshift
gnome-screensaver-command -d
</pre>
It may also be necessary to set the desktop preferences to not start the screensaver.

* *vncviewer starts, but there is no gui.* Check the following:
** Make sure it is not simply minimized or on a different sub-screen. Check the bar at the bottom of the desktop inside the vnc window.
** If it is really not running, try starting the gui using @. ~/setup_SMC_EPICS.sh; run_css@ from a terminal window inside the vncviewer, or equivalently @~/startCSSGUI.sh@.

* *You get an error starting the gui.* For example, if you get "Workspace /home/ubooneshift/.ControlSystemStudio/krb5--as-ubooneshift-on-ubdaq-prod-ws01/CSS is in use. Select a different workspace." then you should kill the vncserver and start over. Log on to ws01, issue vncserver -kill :2 and then . startVNC.sh. Reconnect to the vncserver and restart the gui.

* *The GUI starts to load but gets stuck* For example, the CSS starts to load with the logo and everything but gets stuck at 90% complete never finishing loading. This means the CSS workspace is hung. The solution to this would be to
The solution to this is to copy a working instance ".metadata" directory (e.g. from someone's local workspace, most recent copy of ubooneshift instance would be good) into the non-working ubooneshift instance.
Logged in as ubooneshift on ubdaq-prod-ws01.fnal.gov, do the following (here I am copying a working instance from Glenn's workspace):
<pre>
>cp -a /home/gahs/.ControlSystemStudio/krb5--as-gahs-on-ubdaq-prod-ws01/CSS/.metadata /home/ubooneshift/.ControlSystemStudio/ \
krb5--as-ubooneshift-on-ubdaq-prod-ws01/CSS/
</pre>
Also remove the ".snap" and a ".lock" files from the ".metadata" directory in the ubooneshift workspace:
<pre>
> rm /home/ubooneshift/.ControlSystemStudio/krb5--as-ubooneshift-on-ubdaq-prod-ws01/CSS/.metadata/.lock
> rm /home/ubooneshift/.ControlSystemStudio/krb5--as-ubooneshift-on-ubdaq-prod-ws01/CSS/.metadata/.plugins/ \ /home/ubooneshift/.ControlSystemStudio/krb5--as-ubooneshift-on-ubdaq-prod-ws01/CSS/.metadata/.plugins/\
org.eclipse.core.resources/.projects/CSS/.markers.snap
</pre>

*Note 1:* However, these files may not need to be removed to fix the issue, it was something we tried in the process; typically just copying a working instance into the non-working instance should work and fix the problem. Something to systematically check next time it happens and then update these instructions.
*Note 2:* Given this issue, it would be good for all slow controls experts to regularly update their workspace instance from ubooneshift shifter space so if an event like this were to happen again, we will have the most recent shifter-level working instance to copy from. To update your space with the shifter space, see instructions here: https://cdcvs.fnal.gov/redmine/projects/uboone-operations/wiki/SMC_-_How_to_start_your_own_CSS_gui. The "**" note under step 3 gives you instructions on how to overwrite the files in your space with the shifter space. Slow Controls experts can even put a crontab in their home areas to update their workspace with that of shifters every week or so.

h2. Problems with GUI layout: missing something

* The standard layout is the Alarm "perspective". Select Window -> Perspective -> Alarm or Window -> Perspective -> Other -> Alarm.
* If that doesn't fix the problem, look for a button/icon labeled "Alarm" in the toolbar at top, usually on the right. It should be selected.
* If Alarm perspective is selected and the layout is incorrect, right-click on the Alarm perspective button in the toolbar and choose "Reset".
* If even that doesn't work, it's possible a bad layout has been saved over top of the Alarm perspective. Try to rearrange the windows as you need them to be, and call an expert if you can't get it fixed.

h2. Problems with disconnected channels / missing channel servers

If channels show up as solid pink boxes with the word "disconnected" in them, then the EPICS channel server that should serve these channels is not running. Since these are supposed to start automatically, this is generally speaking an expert problem. However, some simple causes might be

* *Network problem preventing contact to the channel server.* E.g., lost contact to a "slowmoncon box" (for rack status, Glassman HV, and impedance monitor), or to the PMT HV controller (for the PMT HV). If it seems the device is really running correctly (heartbeat or activity LEDs flashing), then this is probably a problem for a network support expert. It wouldn't hurt to call a slowmoncon expert first.
* *A power failure to one of these devices.* If the power is really out, then this is a problem for a electrical support expert.
* *Power is on and network connected, but device is not providing EPICS data.*
** *You can always contact a slow control expert.*
** *Glomation "slowmoncon box" rack monitor*: If you are on-site and have ODH training and a buddy, you can try power cycling using the front panel switch. Or, you can try logging in to it via ssh as the uboonedaq user. E.g., for Glomation !#4, @ssh uboonedaq@192.168.144.204@. Contact a slow control expert for the password. Once logged in, you can try restarting the IOC using the @start_ioc.sh@ script in the home directory. In case a rackmon box needs to be replaced or updated, the slow controls expert should consult [[SMC_-_Installing_software_on_Glomations_-_Experts_Only]].

If none of the above applies, contact a slow monitor/control (slowmoncon) expert.

h2. Problems with alarms

* *Alarm ranges:* there is an expert procedure for changing them. Contact the appropriate subsystem expert to get approval for changes.
* *Alarms not functioning at all, or "server not found":* The alarm system should start and restart itself as needed; currently, there is no non-expert procedure for diagnosing or restarting the alarm system. One might be created if this becomes a problem, but for now, please let a slowmoncon expert know.

h2. Problems with archiving

Entering to and exiting from full-screen mode is done using 'F11'
In this mode, the only visible window will be the SMC, and the archiver will not be visible.
Hence, If the archiver is not visible, checkout if you are on full-screen mode, and if so, exit from it.