Project

General

Profile

SLC - Guide » History » Version 202

Version 201 (Jose Ignacio Crespo Anadon, 03/06/2019 06:13 PM) → Version 202/203 (Sowjanya Gollapinni, 11/05/2019 03:53 PM)

h1. SLC - Operators' Guide

{{toc}}

h2. If something goes wrong....

* If you are restarting from scratch after a power outage or major reboots on one or more computers, see the [[SMC Startup]] page.
* If anything goes wrong in following the instructions below, consult the [[SLC - Troubleshooting]] page.

h2. Launching the Slow Controls Screen

If you are a remote shifter, see [[Off-Site_Procedures]] for how to start up.

In ROC West, the Standard Slow Controls Screen (see the screenshot below) should always be up. If for some reason you cannot find it, launch Slow Controls from the ROC control room following the instructions provided in this section.

!standardscreen.png!

Double-click on the 'Slow Control' icon that is on the desktop of the top right control room screen.

Or, from a terminal window, launch
<pre>
uboone-shift-tools/launch-slow-control
</pre>

> In case the above doesn't work, follow the instructions below:
>
> Open a terminal on the control room computer and check for already running ssh processes that forward port 5902
> %{color:green}@ps ax | grep 'ssh.*-L *5902:'@%
>
> Kill any active ssh processes found above, then type
> %{color:green}@ssh -K -L 5902:localhost:5902 ubooneshift@ubdaq-prod-ws01@%
>
> %{background:yellow}important note:% Once you run the above command and login as user "ubooneshift", it is very important to keep it open
> and not further use this terminal window. Best way is to just minimize it, so it stays in a corner.
>
> Now open _another terminal_ on the control room computer and type
>*do not type the vncviewer command in ws01 or evb!*
> %{color:green}@vncviewer@%
>
> A window should pop up and into it you will type
> %{color:green}@localhost:5902@%

After Launching Slow-control screen from the ROC control room, it will open a virtual window. Even though the connection is secured by kerberized-ssh, the virtual screen sharing software (VNC) will prompt for an extra password, which is the standard uboone password with "smc" tacked on the end %{color:green}xxxxxxsmc%.

> %{background:yellow}Trouble shooting note:% If the VNC viewer fails without even prompting for a password, then either the ssh tunnel has failed or the VNC server needs to be started -- see the [[SLC - Troubleshooting#Problems-with-starting-up-the-GUI]] page.

Once this password is entered, the full virtual screen will open, which should show the Slow Control Display running as in the above picture.

> %{background:yellow}Trouble shooting note:% If the virtual window opens and you don't see the Slow Control Display running, but an empty desktop, you need to un-minimize the Slow Control Dispaly or restart the _CSSGUI_ program -- see the [[SLC - Troubleshooting#Problems-with-starting-up-the-GUI]] page.

h2. Special web pages

Some pages are most easily accessed by starting a Firefox browser within the VNC window. Others should be opened on the control room window.

h3. On Firefox running in the VNC window:

* To check that the archiver is healthy, open http://ubdaq-prod-smc.fnal.gov:4812/main -- this is used for shift checks.
* To check that the JMS2RDB server is healthy, open http://ubdaq-prod-smc.fnal.gov:4913/main -- this is not required.

h3. On a control room computer:

* To hear a sound when a critical slowmon alarm appears, open http://ubdaq-prod-smc.fnal.gov:4714/alarmpage.html on a control room machine that has both speakers and the right proxy set up. See the [[Alarm Sounder]] wiki page for details.

h2. Noticing new alarms

!>AlarmColorScheme.png!

As an operator, one of your main jobs is to watch the [[Alarm area panel]] and [[Alarm Table]] and check for any new alarms showing up. The color scheme of this panel is shown on the right.

*%{color:green}@ALL OF THE SUB-SYSTEM BOXES SHOULD BE GREEN.@%* If you see any deviation from green, you should pay attention immediately and take subsequent action. Keep in mind that Magenta and Red alarms both indicate that something is severely wrong. Magenta indicates a disconnected status (lost total communication with the sub-system) and Red means some severe error occurred and both need expert attention right away.

As noted on the image above, highest-severity un-acknowledged alarm is shown both on the [[Alarm area panel]] and [[overview panel]] which means that the highest-serverity unacknowledged alarm color will mask any other existing underlying alarms in the same system unless it is cleared. For example, if you have a magenta (INVALID) and red (MAJOR) alarm for a sub-system, you will see that the alarm panel area for that sub-system takes the Magenta color which is the highest-severity alarm. The [[Alarm Table]] view lists _Current alarms_ and _Acknowledged alarms_. One can sort Current alarms by "Alarm time" to find out what is new.

*Note:* when you acknowledge an alarm, it takes blue color, it is Important to keep this difference in mind when looking for new alarms. Also note that the acknowledged alarm color (blue) only affects the alarm area panel view. The actual alarm color (and so the actual severity) is kept in all other views ([[Alarm Tree]], [[Alarm Table]], [[overview panel]] and all individual sub-system panels).

h3. Alarm sounder page

If you have a certain web page open, an audio alert will be played by your browser when certain channels go into alarm. See the [[Alarm Sounder]] wiki page for information on how to use this feature. This is a supplement to the alarm panel, not a replacement for it; only certain channels will cause the sound. Be sure to keep watching the panels.

h2. How to respond to alarms?

It is highly important that you take appropriate action when you see an alarm. What action you should take depends on the severity of the alarm. Follow the following alarm alert procedure:

*INVALID (Magenta) alarms:* Invalid alarm represents a disconnected status, immediately call the on-call Slow Controls expert and e-log.
*MAJOR (Red) alarms:* Major alarm represents a severe problem, immediately contact the corresponding sub-system on-call expert and e-log.
*MINOR (Yellow) alarms:* Minor alarms are warnings to the operator to keep an eye on the sub-system, so, e-log and keep an eye.

_%{color:red}@Do not attempt to acknowledge any alarms unless otherwise instructed by the Run coordinator and/or on-call sub-system experts. In case of doubt, always double check.@%_

h3. Which expert to call for which alarm?

|_.Sub-system |_.On-call expert |
|ArPurity| Purity Monitor expert|
|BeamData| Run Coordinator|
|CrateRails| Readout expert|
|/2. Cryo | call order (if alarm confirmed on ROC-W IFIX display): Cryo on-call expert, Run Coordinator|
| call order (if no alarm on ROC-W IFIX display): Run Coordinator|
|DAQStatus (XMIT or sn_read_lag_multiplicity)| Readout expert|
|DAQStatus (other)| DAQ expert|
|OnDetPower | TPC expert|
|PCStatus (disk space/load) | Data Management expert (first), Slow Controls expert (second)|
|PCStatus (other) | DAQ expert (first), Run Coordinator (second)|
|PMTHV| PMT expert|
|RackFans| Run Coordinator|
|RackProt | Run Coordinator|
|RackTemps | Run Coordinator|
|TPCDrift | Drift HV expert|
|ZMON | Run Coordinator|

Look at the [[Expert call list]] for contact details of on-call experts.

h2. Getting more information on the alarms

When you do a mouse-over on the alarm in the [[Alarm Tree]], additional information shows up.

h3. Emailing alarms

If you right-click on an alarm in the [[Alarm Tree]], there is an option [[send email]]. If you click on this, the alarm information will be automatically pasted into an email, then you can send to yourself/experts. Note that there is a "Screen shot" button on the bottom in the email panel. If you click this, a screen shot of the current slow control window view will be automatically added to the email as an attachment. This is a very useful feature.

h3. Acknowledging alarms

_Remember, as stated before, no alarm should be acknowledged without contacting the corresponding sub-system expert or the run coordinator or unless it is in the [[Run plan]]_

To acknowledge an alarm, right-click on the alarm in the [[Alarm Tree]] and select [[acknowledge]] and the alarm will be acknowledged. Acknowledging an alarm will move it from [[Current alarm list]] to [[acknowledged alarm list]].

h2. How to make Shift plots?

See here for instructions for [[making Shift plots]]

h2. How to export plot data?

See her for instructions to [[export plot data]]

h2. Posting Shift plots on E-log

See here for instructions on [[posting shift plots on e-log]].

h2. How to get the overview panel back if it disappeared

Don't panic. Reopening the [[overview panel]] is not complicated. On the left hand side vertical bar of the Standard Slow controls screen, there is a [[navigator icon]], clicking on it will open the navigator, one can then select the overview.opi file and double click on it, this will open the overview panel. See the image below highlighting where the navigator is located on the Slow control screen. Clicking on the navigator icon again will minimize it.

If for some reason, the navigator icon is missing on the vertical left-hand side bar, follow these instructions: on the [[first toolbar]], click on window -> "Show View" and then select "Navigator". This will open the navigator and you can then select the overview.opi file and open it.

h2. How to get the alarm panels back if they disappeared

Don't panic. There are multiple ways to get them back.
1. The [[third toolbar]] has buttons "Alarm", Data Browser" and "OPI Runtime". Click on "Alarm" and things should come back.
2. Or, on the first toolbar click on [[CSS]] -> "Alarm" and then "Alarm Area Panel", or "Alarm Tree", etc.
3. Or, on the first toolbar, click on [[window]] -> "Show View" and then select "Alarm Area Panel", or "Alarm Tree", etc.

The third option can be used to open any panels that you may have accidentally closed (for example, properties tab, Navigator, Export Samples, etc.)

h2. Information on Sub-systems

This section gives sub-system monitoring related information to operators and instructions on how to navigate to various panels. You should be able to navigate to any subsystem screen using the [[overview panel]]. Note that Slow Controls is set up such that any panel selected opens in a new tab so as to not block the view of the operator.

h3. How to know IP addresses of devices at LArTF?
See here: https://cdcvs.fnal.gov/redmine/projects/uboonedaq/wiki/IPStandards_at_LArTF

h3.
*%{color:green}@BEAM@%*

The Beam panel in Slow Controls monitors both BNB and NUMI beam conditions.

A list of beam variables currently monitored in Slow Controls is given here: [[beam variables]]

How to access [[Beam panel]]

h3. *%{color:green}@TPC DRIFT@%*

At the operator level, no controls for the Drift HV supply are provided, only monitoring. The control of Drift HV is completely expert only action. The Drift HV control and monitoring programs are designed such that when the expert panel is accessed by anyone, the operator will immediately get an indication of it in the form of a minor alarm (Expert GUI running or not running).

The digital voltage and current read back from the power supply are monitored every second. The analog current read back from the supply (through the back plane connections) is recorded using an external device (Keithley data logger, series 2700) and the field cage pick off point is monitored using another external device (Keithley Sourcemeter, series 2410). In addition to the above, oscilloscope traces (up to 4 traces) from PMT paddles (or other devices) can be monitored and saved in the Slow monitoring database. Also full remote control of the scope is implemented.

How to access the [[TPC Drift panel]]

See here for a list of TPC Drift related variables currently in Slow Controls: [[TPCDrift Variables]]

h3. *%{color:green}@CRYO@%*

CRYO conditions change, so it is important to discuss with the previous shifters or run coordinator or read e-log to understand Cryo conditions during your shift and get any special instructions that would be necessary. Slow Controls extracts Cryo information from the IFIX database used by the Cryo team. The Run coordinator and the Slow Controls team is in constant contact with the Cryo team (Mike Z and Mike G.) to understand the state of Cryogenics and improve/update alarms on Cryo variables.

How to access [[Cryo panels]]

See here for a list of Cryo variables currently in Slow Controls: [[Cryo Variables]]

h3. *%{color:green}@OnDetPower@%*

The *TPCPS* rack houses two Wiener power supplies that provide power to ASICS (low voltage) and Wire Bias (medium voltage). Both ASICS and Wire Bias channels are grouped per Feed through (FT) so it is easier for mapping channels to Feed throughs. Given the large number of channels that feed the ASICS, group on/off capability is provided behind the expert password wall. Two of the LV channels are also used as DAQ Calibration fan out channels and one of the LV channel is used to power the PMT flasher board.

To avoid accidental power cycling and operator error, the "Main Switch" control for these two power supplies are kept in special panels behind a double password protected wall. This is STRICTLY expert only operation. See below to understand how to access various On Detector power panels:

How to access [[Main Switch panel]]

How to access [[ASICS LV panel(s)]]

How to access [[WireBias panel(s)]]

How to access [[Calibration fan out channels]]

How to access [[PMT Flasher panel]]

See here for a list of On Detector Power Variables currently included in Slow Controls: [[ASICS LV Variables]], [[Calibration Fanout Variables]], [[PMT Flasher Variables]], and [[WireBias Variables]]

h3. *%{color:green}@CrateRails@%*

The sub-system *CrateRails* powers the TPC electronics in racks (TPC1, TPC2, TPC3, TPC4, TPC5 and partially TRIG/PMT rack) and CRT readout Front End Board (FEBs) in the CRT Utilities rack. While taking data it is important to make sure that power supplies are in good state.

How to access *TPC* [[Crate panels]]
How to access [[CRT Crate panels]]

See here for a list of CrateRails power supply variables: [[CrateRails Variables]]

h3. *%{color:green}@DAQ@%*

When the DAQ is not running (i.e., when you are not taking a run), there's no valid data from the DAQ on these variables, and they do not update. Most variables turn magenta when there is no valid data, but this behavior is disabled for DAQ variables when the DAQ is not running. So, it is normal if you notice that the DAQ metrics do not update when the run is stopped normally.

If the run crashes in such a way that the slow monitoring is never told that the run has stopped, then non-updating variables may go to Invalid (magenta) status. They should clear on their own within 15 seconds to a minute after a new run is started. It may take a couple of minutes for the status to update though. If you see any alarms during data taking, consider them real and follow the alarm alert procedure.

Also, note that if a run results in error, i.e., if you see things turn red or magenta during data taking, these errors get latched that means even after you have terminated the run, you will continue to see the latched alarms. In this case, the best way is to acknowledge those alarms (these alarms will go away once you start a new run), so that the large number of these alarms won't block the view of the operators in finding other important alarms from the alarm table.

(_For those who want to understand this behavior: Slow monitoring extracts the DAQ information from Ganglia (the monitoring system run by DAQ folks to specifically monitor DAQ processes). DAQ doesn't send information to Ganglia when it is not running. The Ganglia monitor daemon runs, but it drops the metrics when it sees it hasn't gotten an update in a while. So, when the DAQ is down, there's no valid data from the DAQ on these variables, so they turn magenta. The special exception is that invalid status is suppressed if the run number variable in slow monitoring is zero. _)

See here for a list of DAQ variables in Slow Controls: [[DAQ Variables]]

How to access [[DAQ panel]]

h3. *%{color:green}@PMTHV@%*

The photomultiplier power supplies are reused from the D0 experiment, and have custom IOCs running in their own controllers. The PMT crate contains 6 PMT HV modules (one of which is a spare) with 8 channels each. Note that not all HV channels are used for operation.

See here for a list of PMT HV variables currently monitored/controlled in Slow Controls: [[PMTHV variables]]

How to access [[PMT HV panel(s)]]

h3. *%{color:green}@RackFans, RackTemps, RackProtection@%*

Each rack (all racks on the platform and in the DAQ room) contains temperature probes (2 or 3 probes) that measure the air temperature in the rack at various locations. For racks with TPC crates (for example, TPC racks), there is an over-temperature trip associated with Crate power supplies in case the temperature near the power supply gets too warm.

There are also 1 to 2 fan packs (each fan pack contains 6 fans) in the TPC racks to make sure the temperature in the racks doesn't get too warm. It is important that these fans run at nominal speeds to ensure electronics won't get too warm. Every MicroBooNE rack houses a rack protection system that will turn off the AC power to the rack in case of unsafe situations.

See here for a list of rack variables currently monitored under Slow Controls: [[rack variables]]

How to access [[rack status panels]]

h3. *%{color:green}@ZMON@%*

This is the impedance monitor for the detector ground.

See here for a list of ZMON variables currently monitored in Slow Controls: [[ZMON Variables]]

How to access [[ZMON monitoring panel]]

h3. *%{color:green}@PCStatus@%*

PC Status is the hardware monitoring of the DAQ computers.

See here for a list of PC Status variables currently monitored in Slow controls: [[PC Status Variables]]

How to access [[PC Status panel]]

h3. *%{color:green}@Ar Purity@%*

MicroBooNE has 3 purity monitors: 2 inside the cryostat and one inline. The inline one (purity monitor (index 3) is currently not exercised. The two purity monitors that we monitor are short one (index 0) and long one (index 1). One important variable to keep an eye on is the uB_ArPurity_PM01_0/AGE variable which tells you the time elapsed between current and previous value.

See here for a list of Ar Purity variables currently monitored in Slow controls: [[Ar Purity Variables]]

How to access [[Ar Purity panel]]

h3. *%{color:green}@Environment@%*

The weather conditions are acquired from various sources such as Dupage County airport, Fermilab Weather station, IFBeamData etc.

See here for a list of Environment variables currently monitored in Slow controls: [[Environment Variables]]

How to access [[Weather panel]]

h2. Saving and loading sub-system settings

The PV Table tool can be used to save many values into a file. These values can be restored later if desired. This is particularly useful for the PMT HV system, which resets all target voltages to zero after a power outage.

It is easier to demonstrate than to read about, so please see this "short video":https://youtu.be/R-eeaDsYTUA. A screen shot of a PV Table display can be found [[here]].

h2. General CSS users manual

Control System Studio has an extensive users' manual accessible from the "Help" menu within the application. The key parts of it for MicroBooNE have been extracted for browsing at "this link":http://www-microboone.fnal.gov/css-user-manual/help/topic/index.html.
This might be useful as a supplement to the MicroBooNE-specific information given above.

---------