SLC - Guide » History » Version 202
SLC - Operators' Guide¶
- Table of contents
- SLC - Operators' Guide
- If something goes wrong....
- Launching the Slow Controls Screen
- Special web pages
- Noticing new alarms
- How to respond to alarms?
- Getting more information on the alarms
- How to make Shift plots?
- How to export plot data?
- Posting Shift plots on E-log
- How to get the overview panel back if it disappeared
- How to get the alarm panels back if they disappeared
- Information on Sub-systems
- Saving and loading sub-system settings
- General CSS users manual
If something goes wrong....¶
- If you are restarting from scratch after a power outage or major reboots on one or more computers, see the SMC Startup page.
- If anything goes wrong in following the instructions below, consult the SLC - Troubleshooting page.
Launching the Slow Controls Screen¶
If you are a remote shifter, see Off-Site_Procedures for how to start up.
In ROC West, the Standard Slow Controls Screen (see the screenshot below) should always be up. If for some reason you cannot find it, launch Slow Controls from the ROC control room following the instructions provided in this section.
Double-click on the 'Slow Control' icon that is on the desktop of the top right control room screen.
Or, from a terminal window, launch
In case the above doesn't work, follow the instructions below:
Open a terminal on the control room computer and check for already running ssh processes that forward port 5902
ps ax | grep 'ssh.*-L *5902:'
Kill any active ssh processes found above, then type
ssh -K -L 5902:localhost:5902 ubooneshift@ubdaq-prod-ws01
important note: Once you run the above command and login as user "ubooneshift", it is very important to keep it open
and not further use this terminal window. Best way is to just minimize it, so it stays in a corner.
Now open another terminal on the control room computer and type
do not type the vncviewer command in ws01 or evb!
A window should pop up and into it you will type
After Launching Slow-control screen from the ROC control room, it will open a virtual window. Even though the connection is secured by kerberized-ssh, the virtual screen sharing software (VNC) will prompt for an extra password, which is the standard uboone password with "smc" tacked on the end xxxxxxsmc.
Trouble shooting note: If the VNC viewer fails without even prompting for a password, then either the ssh tunnel has failed or the VNC server needs to be started -- see the SLC - Troubleshooting page.
Once this password is entered, the full virtual screen will open, which should show the Slow Control Display running as in the above picture.
Trouble shooting note: If the virtual window opens and you don't see the Slow Control Display running, but an empty desktop, you need to un-minimize the Slow Control Dispaly or restart the CSSGUI program -- see the SLC - Troubleshooting page.
Special web pages¶
Some pages are most easily accessed by starting a Firefox browser within the VNC window. Others should be opened on the control room window.
On Firefox running in the VNC window:¶
- To check that the archiver is healthy, open http://ubdaq-prod-smc.fnal.gov:4812/main -- this is used for shift checks.
- To check that the JMS2RDB server is healthy, open http://ubdaq-prod-smc.fnal.gov:4913/main -- this is not required.
On a control room computer:¶
- To hear a sound when a critical slowmon alarm appears, open http://ubdaq-prod-smc.fnal.gov:4714/alarmpage.html on a control room machine that has both speakers and the right proxy set up. See the Alarm Sounder wiki page for details.
Noticing new alarms¶
ALL OF THE SUB-SYSTEM BOXES SHOULD BE GREEN. If you see any deviation from green, you should pay attention immediately and take subsequent action. Keep in mind that Magenta and Red alarms both indicate that something is severely wrong. Magenta indicates a disconnected status (lost total communication with the sub-system) and Red means some severe error occurred and both need expert attention right away.
As noted on the image above, highest-severity un-acknowledged alarm is shown both on the Alarm area panel and overview panel which means that the highest-serverity unacknowledged alarm color will mask any other existing underlying alarms in the same system unless it is cleared. For example, if you have a magenta (INVALID) and red (MAJOR) alarm for a sub-system, you will see that the alarm panel area for that sub-system takes the Magenta color which is the highest-severity alarm. The Alarm Table view lists Current alarms and Acknowledged alarms. One can sort Current alarms by "Alarm time" to find out what is new.
Note: when you acknowledge an alarm, it takes blue color, it is Important to keep this difference in mind when looking for new alarms. Also note that the acknowledged alarm color (blue) only affects the alarm area panel view. The actual alarm color (and so the actual severity) is kept in all other views (Alarm Tree, Alarm Table, overview panel and all individual sub-system panels).
Alarm sounder page¶
If you have a certain web page open, an audio alert will be played by your browser when certain channels go into alarm. See the Alarm Sounder wiki page for information on how to use this feature. This is a supplement to the alarm panel, not a replacement for it; only certain channels will cause the sound. Be sure to keep watching the panels.
How to respond to alarms?¶
It is highly important that you take appropriate action when you see an alarm. What action you should take depends on the severity of the alarm. Follow the following alarm alert procedure:
INVALID (Magenta) alarms: Invalid alarm represents a disconnected status, immediately call the on-call Slow Controls expert and e-log.
MAJOR (Red) alarms: Major alarm represents a severe problem, immediately contact the corresponding sub-system on-call expert and e-log.
MINOR (Yellow) alarms: Minor alarms are warnings to the operator to keep an eye on the sub-system, so, e-log and keep an eye.
Do not attempt to acknowledge any alarms unless otherwise instructed by the Run coordinator and/or on-call sub-system experts. In case of doubt, always double check.
Which expert to call for which alarm?¶
|ArPurity||Purity Monitor expert|
|Cryo||call order (if alarm confirmed on ROC-W IFIX display): Cryo on-call expert, Run Coordinator|
|call order (if no alarm on ROC-W IFIX display): Run Coordinator|
|DAQStatus (XMIT or sn_read_lag_multiplicity)||Readout expert|
|DAQStatus (other)||DAQ expert|
|PCStatus (disk space/load)||Data Management expert (first), Slow Controls expert (second)|
|PCStatus (other)||DAQ expert (first), Run Coordinator (second)|
|TPCDrift||Drift HV expert|
Look at the Expert call list for contact details of on-call experts.
Getting more information on the alarms¶
When you do a mouse-over on the alarm in the Alarm Tree, additional information shows up.
If you right-click on an alarm in the Alarm Tree, there is an option send email. If you click on this, the alarm information will be automatically pasted into an email, then you can send to yourself/experts. Note that there is a "Screen shot" button on the bottom in the email panel. If you click this, a screen shot of the current slow control window view will be automatically added to the email as an attachment. This is a very useful feature.
Remember, as stated before, no alarm should be acknowledged without contacting the corresponding sub-system expert or the run coordinator or unless it is in the Run plan
To acknowledge an alarm, right-click on the alarm in the Alarm Tree and select acknowledge and the alarm will be acknowledged. Acknowledging an alarm will move it from Current alarm list to acknowledged alarm list.
How to make Shift plots?¶
See here for instructions for making Shift plots
How to export plot data?¶
See her for instructions to export plot data
Posting Shift plots on E-log¶
See here for instructions on posting shift plots on e-log.
How to get the overview panel back if it disappeared¶
Don't panic. Reopening the overview panel is not complicated. On the left hand side vertical bar of the Standard Slow controls screen, there is a navigator icon, clicking on it will open the navigator, one can then select the overview.opi file and double click on it, this will open the overview panel. See the image below highlighting where the navigator is located on the Slow control screen. Clicking on the navigator icon again will minimize it.
If for some reason, the navigator icon is missing on the vertical left-hand side bar, follow these instructions: on the first toolbar, click on window -> "Show View" and then select "Navigator". This will open the navigator and you can then select the overview.opi file and open it.
How to get the alarm panels back if they disappeared¶
Don't panic. There are multiple ways to get them back.
1. The third toolbar has buttons "Alarm", Data Browser" and "OPI Runtime". Click on "Alarm" and things should come back.
2. Or, on the first toolbar click on CSS -> "Alarm" and then "Alarm Area Panel", or "Alarm Tree", etc.
3. Or, on the first toolbar, click on window -> "Show View" and then select "Alarm Area Panel", or "Alarm Tree", etc.
The third option can be used to open any panels that you may have accidentally closed (for example, properties tab, Navigator, Export Samples, etc.)
Information on Sub-systems¶
This section gives sub-system monitoring related information to operators and instructions on how to navigate to various panels. You should be able to navigate to any subsystem screen using the overview panel. Note that Slow Controls is set up such that any panel selected opens in a new tab so as to not block the view of the operator.
How to know IP addresses of devices at LArTF?
See here: https://cdcvs.fnal.gov/redmine/projects/uboonedaq/wiki/IPStandards_at_LArTF¶
The Beam panel in Slow Controls monitors both BNB and NUMI beam conditions.
A list of beam variables currently monitored in Slow Controls is given here: beam variables
How to access Beam panel
At the operator level, no controls for the Drift HV supply are provided, only monitoring. The control of Drift HV is completely expert only action. The Drift HV control and monitoring programs are designed such that when the expert panel is accessed by anyone, the operator will immediately get an indication of it in the form of a minor alarm (Expert GUI running or not running).
The digital voltage and current read back from the power supply are monitored every second. The analog current read back from the supply (through the back plane connections) is recorded using an external device (Keithley data logger, series 2700) and the field cage pick off point is monitored using another external device (Keithley Sourcemeter, series 2410). In addition to the above, oscilloscope traces (up to 4 traces) from PMT paddles (or other devices) can be monitored and saved in the Slow monitoring database. Also full remote control of the scope is implemented.
How to access the TPC Drift panel
See here for a list of TPC Drift related variables currently in Slow Controls: TPCDrift Variables
CRYO conditions change, so it is important to discuss with the previous shifters or run coordinator or read e-log to understand Cryo conditions during your shift and get any special instructions that would be necessary. Slow Controls extracts Cryo information from the IFIX database used by the Cryo team. The Run coordinator and the Slow Controls team is in constant contact with the Cryo team (Mike Z and Mike G.) to understand the state of Cryogenics and improve/update alarms on Cryo variables.
How to access Cryo panels
See here for a list of Cryo variables currently in Slow Controls: Cryo Variables
The TPCPS rack houses two Wiener power supplies that provide power to ASICS (low voltage) and Wire Bias (medium voltage). Both ASICS and Wire Bias channels are grouped per Feed through (FT) so it is easier for mapping channels to Feed throughs. Given the large number of channels that feed the ASICS, group on/off capability is provided behind the expert password wall. Two of the LV channels are also used as DAQ Calibration fan out channels and one of the LV channel is used to power the PMT flasher board.
To avoid accidental power cycling and operator error, the "Main Switch" control for these two power supplies are kept in special panels behind a double password protected wall. This is STRICTLY expert only operation. See below to understand how to access various On Detector power panels:
How to access Main Switch panel
How to access ASICS LV panel(s)
How to access WireBias panel(s)
How to access Calibration fan out channels
How to access PMT Flasher panel
The sub-system CrateRails powers the TPC electronics in racks (TPC1, TPC2, TPC3, TPC4, TPC5 and partially TRIG/PMT rack) and CRT readout Front End Board (FEBs) in the CRT Utilities rack. While taking data it is important to make sure that power supplies are in good state.
See here for a list of CrateRails power supply variables: CrateRails Variables
When the DAQ is not running (i.e., when you are not taking a run), there's no valid data from the DAQ on these variables, and they do not update. Most variables turn magenta when there is no valid data, but this behavior is disabled for DAQ variables when the DAQ is not running. So, it is normal if you notice that the DAQ metrics do not update when the run is stopped normally.
If the run crashes in such a way that the slow monitoring is never told that the run has stopped, then non-updating variables may go to Invalid (magenta) status. They should clear on their own within 15 seconds to a minute after a new run is started. It may take a couple of minutes for the status to update though. If you see any alarms during data taking, consider them real and follow the alarm alert procedure.
Also, note that if a run results in error, i.e., if you see things turn red or magenta during data taking, these errors get latched that means even after you have terminated the run, you will continue to see the latched alarms. In this case, the best way is to acknowledge those alarms (these alarms will go away once you start a new run), so that the large number of these alarms won't block the view of the operators in finding other important alarms from the alarm table.
(_For those who want to understand this behavior: Slow monitoring extracts the DAQ information from Ganglia (the monitoring system run by DAQ folks to specifically monitor DAQ processes). DAQ doesn't send information to Ganglia when it is not running. The Ganglia monitor daemon runs, but it drops the metrics when it sees it hasn't gotten an update in a while. So, when the DAQ is down, there's no valid data from the DAQ on these variables, so they turn magenta. The special exception is that invalid status is suppressed if the run number variable in slow monitoring is zero. _)
See here for a list of DAQ variables in Slow Controls: DAQ Variables
How to access DAQ panel
The photomultiplier power supplies are reused from the D0 experiment, and have custom IOCs running in their own controllers. The PMT crate contains 6 PMT HV modules (one of which is a spare) with 8 channels each. Note that not all HV channels are used for operation.
See here for a list of PMT HV variables currently monitored/controlled in Slow Controls: PMTHV variables
How to access PMT HV panel(s)
RackFans, RackTemps, RackProtection¶
Each rack (all racks on the platform and in the DAQ room) contains temperature probes (2 or 3 probes) that measure the air temperature in the rack at various locations. For racks with TPC crates (for example, TPC racks), there is an over-temperature trip associated with Crate power supplies in case the temperature near the power supply gets too warm.
There are also 1 to 2 fan packs (each fan pack contains 6 fans) in the TPC racks to make sure the temperature in the racks doesn't get too warm. It is important that these fans run at nominal speeds to ensure electronics won't get too warm. Every MicroBooNE rack houses a rack protection system that will turn off the AC power to the rack in case of unsafe situations.
See here for a list of rack variables currently monitored under Slow Controls: rack variables
How to access rack status panels
This is the impedance monitor for the detector ground.
See here for a list of ZMON variables currently monitored in Slow Controls: ZMON Variables
How to access ZMON monitoring panel
PC Status is the hardware monitoring of the DAQ computers.
See here for a list of PC Status variables currently monitored in Slow controls: PC Status Variables
How to access PC Status panel
MicroBooNE has 3 purity monitors: 2 inside the cryostat and one inline. The inline one (purity monitor (index 3) is currently not exercised. The two purity monitors that we monitor are short one (index 0) and long one (index 1). One important variable to keep an eye on is the uB_ArPurity_PM01_0/AGE variable which tells you the time elapsed between current and previous value.
See here for a list of Ar Purity variables currently monitored in Slow controls: Ar Purity Variables
How to access Ar Purity panel
The weather conditions are acquired from various sources such as Dupage County airport, Fermilab Weather station, IFBeamData etc.
See here for a list of Environment variables currently monitored in Slow controls: Environment Variables
How to access Weather panel
Saving and loading sub-system settings¶
The PV Table tool can be used to save many values into a file. These values can be restored later if desired. This is particularly useful for the PMT HV system, which resets all target voltages to zero after a power outage.
General CSS users manual¶
Control System Studio has an extensive users' manual accessible from the "Help" menu within the application. The key parts of it for MicroBooNE have been extracted for browsing at this link.
This might be useful as a supplement to the MicroBooNE-specific information given above.