SLC - Shifters¶
- Table of contents
- SLC - Shifters
- First thing to do: Find the slow controls screen
- IMPORTANT NOTES ABOUT THE SLOW MONITORING STANDARD DISPLAY WINDOW:
- MONITORING INSTRUCTIONS DURING THE SHIFT:
- MAKING SHIFT PLOTS
First thing to do: Find the slow controls screen¶
If you cannot find the Standard Slow controls screen (as shown below), launch Slow Controls from the ROC control room following the instructions provided in this section.
Double-click on the 'Slow Control' icon that is on the desktop of the top right control room screen.
Or, from a terminal window, launch
In case the above doesn't work, follow the instructions below:¶
Open a terminal on the control room computer and type
ssh -K -L 5902:localhost:5902 ubooneshift@ubdaq-prod-ws01
important note: Once you run the above command and login as user "ubooneshift", it is very important to keep it open
and not further use this terminal window. Best way is to just minimize it, so it stays in a corner.
Now open another terminal on the control room computer and type
A window should pop up and into it you will type
the password is the standard uboone password with "smc" tacked on the end
In case even that doesn't work¶
After Opening the VNC screen you should see¶
After launching the Slow Control VNC from the ROC control room, it will open a virtual window showing the Slow Control Display running as shown in the above picture.
If the virtual window opens and you don't see the Slow Control Display running, you may need to start or restart the Control System GUI. See SLC_-_Troubleshooting
IMPORTANT NOTES ABOUT THE SLOW MONITORING STANDARD DISPLAY WINDOW:¶
As a shifter, you will be mainly watching the Alarm area panel and Alarm Tree that are located on the left in the above screenshot. The color scheme of this panel is shown below:
Note: Ideally (during stable running) you would want all of the sub-system boxes to be green. Except for a couple of sub-systems (which will be discussed in the next section), everything should be green. So, except for acknowledged alarms (acknowledged in agreement with the sub-system expert) and known alarms, if you see any deviation from green, you should pay attention. Keep in mind that Magenta is more sever than Red color alarm. Magenta indicates a disconnected status (lost total communication with the sub-system) and Red means something is severely wrong and both need expert attention right away.
As noted in the image above, highest-severity un-acknowledged alarm is shown both on the Alarm area panel and overview panel (overview panel is located on the top right of the Slow monitoring display) which means that the highest-serverity unacknowledged alarm color will mask any other existing underlying alarms in the same system unless it is cleared or acknowledged. For example, if you have a magenta (INVALID) and red (MAJOR) alarm for a given sub-system, you will see the alarm panel area for that sub-system takes the Magenta color which is the highest-severity alarm.
*Notice from the above picture that when you acknowledge an alarm, it takes a blue color, it is Important to keep this difference in mind when looking for new alarms. Notice that the blue color for acknowledged alarm only affects the alarm panel view. The actual alarm color (and so the actual severity) is kept in all other views (alarm tree, alarm table, overview panel and all individual sub-system panels).
One of the very useful features on the slow monitoring display is the Alarm Table (see the image below) that is located on the bottom right of the slow control display window. It lists Current alarms and Acknowledged alarms. Clicking on Alarm Time column will sort the Current alarms list in time showing the newest alarm first. Shifters are advised to sort the Alarm Table every now and then to make sure no new alarms are missed.
MONITORING INSTRUCTIONS DURING THE SHIFT:¶
Eyes on the Alarm Panel (and Alarm Tree and Alarm Table as needed).
If you are first time shifters, take a note of the alarm panel status at the beginning of your shift and take that as a reference. Any deviation (new alarms) from the reference panel should be investigated as per the instructions that follow in this section. It is always good to note the total number of current alarms during the start of the shift. All the alarms you see under the "current alarms" tab (bottom right) should be taken seriously at this stage, they are all real alarms.
Important note: some of the sub-systems in the alarm panel sometimes will rapidly change colors, ignore such glitches (we will mention the sub-systems that show this behavior below). Take action only on Steady alarms.
Since the beam conditions are varying, follow up with the run coordinator or discuss with the operators on pervious shifts, look at e-log etc. to understand the conditions of the beam and monitor accordingly.
When the Drift HV power supply is OFF, the status goes INVALID/DISCONNECTED (magenta color) in the alarm area panel. Acknowledge all the alarms (after confirming from the on-call Drift HV expert) when the supply is off. When the Drift HV is ON, you will get specific instructions on what to monitor or plot from the Run Coordinator or Drift HV on-call expert. This is a very critical sub-systerm and no alarms are to be acknowledged without contacting the Drift HV expert. Operators are welcome to contact the on-call slow controls expert if there is a question about understanding the DriftHV panel.
Currently (2015/07/10) MicroBooNE has just finished filling. Keep a close eye on the Cryo alarms. Given the current Cryo status, which variables are most critical to monitor may change. Check with the previous shift, the logbook, and the Run Coordinator.
a. Known important variables: As of 2015/07/10, some important variables are the gas analyzer readings AT608, AT609, AT610, pressures PT165 and PT102, and level probe LT125. Follow the ALARM ALERT PROCEDURE for Cryo given below if these variables go into an alarm state. Remember the important cryo variables may change, so do not rely on the wiki; the previous shift, the logbook, and the Run Coordinator are the best sources of up-to-date info.
b. In the case of OTHER Cryo variables: If you see any alarms (MINOR or MAJOR) just acknowledge and completely ignore. Acknowledging is important since otherwise these unimportant alarms can mask the important alarms if there will be any.
ALARM ALERT PROCEDURE for Cryo: Cryo experts are always monitoring all the Cryo variables. So, when any major alarm happens, it is likely that they know it and they will E-log it and take necessary action. So, When a shifter sees a major alarm in any of the important variables, the shifter is instructed to first look at E-log to see if the Cryo experts have already reported it, in that case, no action is needed. But, if it is not elogged by the Cryo experts, then the shifters need to e-log the observation, page the cryo on-call expert (630-255-1324...after hearing the beeps, enter callback number---6308406967---then #), and contact the Run coordinator immediately who will then take the issue further.
The TPCPS rack houses the power supply that provides power to ASICS (LV) and Wire Bias (HV). The HV channels that are OFF and/or not exercised (for example, channels that serve the Flasher board for PMTs or Calibration fan out channels) which results in alarms, all those have been acknowledged. So, don't worry about them. Monitoring the ASICS LV channels and the Wire bias HV channels is what is important when taking data.
One can reach the WireBias HV and ASICS LV panel from the main overview panel by clicking on the purple icon Open Power Supply Panel that is located on the bottom left of the main overview display. This will take you to the Power supply main panel. Then under "TPCPS" rack, one can click on "ASICS LV" or "WireBias HV" button and reach the desired panel.
If you see any alarms for the OnDetPower_TPCPS sub-system, it is important to see what is wrong. It is important to make sure that no ASICS LV channels on TPCPS have tripped before and during data taking. Scroll down the ASICS LV overview screen to make sure all channel buttons are lit green. If a channel has tripped (red state), contact the on-call DAQ and TPC expert immediately.
In the case of WireBias HV channels, contact the TPC expert immediately. A magenta alarm is more appropriate for the on-call slow controls expert to be alerted since it represents we lost communication with the system, but notifying the sub-system experts to keep them informed is recommended.
If the alarm is minor, make an e-log and look at history and study the alarm.
Note from shift 7/31/2015: LV for Calibration Fanout shows red on the non-expert display, DAQ expert says that this is currently not a problem unless calibration is going on. Similarly, PMT related Flasher channels are not always on, so you will notice these are acknowledged already.
The sub-system CrateRails powers the rack electronics in racks (TPC1, TPC2, TPC3, TPC4, TPC5 and partially TRIG/PMT rack). While taking data it is important to make sure the power supplies are in good state. This can be quickly ensured by looking at the Power Supply Main Panel and if necessary clicking on the "Open Panel" button in each rack.
Follow the regular procedure for reporting alarms:
MINOR alarms: e-log and keep an eye.
MAJOR alarms: alert the on-call WARM Read-out electronics expert (main person) keeping the Slow-controls expert in the loop (to troubleshoot monitoring related issues if any)
Note: This is one of the sub-systems that changes the alarm status rapidly sometimes, ignore such glitches and only follow-up on "steady" alarms.
4. DAQStatus_DAQX (as of 08/03/2015):¶
When the DAQ is not running (i.e., when you are not taking a run), there's no valid data from the DAQ on these variables, so they WILL ALL TURN MAGENTA. So, it is normal if you notice that they all alarm with magenta status sometime after the run is stopped. They should clear on their own after sometime. When you start taking the run, things should get back to normal (i.e., no magenta status). It may take a couple of minutes for the status to update though. If you see any alarms during data taking, consider them real and follow the alarm alert procedure.
Also, please note that if a run results in error, i.e., if you see things turn red or magenta during data taking, these errors get latched that means even after you have terminated the run, you will continue to see the latched alarms. In this case, the best way is to acknowledge those alarms (these alarms will go away once you start a new run), so that the large number of these alarms won't block the view of the operators in finding other important alarms from the alarm table.
(For those who want to understand this behavior: Slow monitoring extracts the DAQ information from Ganglia (the monitoring system run by DAQ folks to specifically monitor DAQ processes). DAQ doesn't send information to Ganglia when it is not running. The Ganglia monitor daemon runs, but it drops the metrics when it sees it hasn't gotten an update in a while. So, when the DAQ is down, there's no valid data from the DAQ on these variables, so they turn magenta.)
Please note that PMT HV is not ON all the time. When the PMT HV supply is not on, you will see disconnected (INVALID MAGENTA) status on the PMT HV panel.
In some cases, the PMT HV supply will be on, but the individual channels on the HV modules are off or have no voltage. Since in the steady-state detector operation, an OFF state is considered an error state, when the PMT HV channels are in an off state, you will notice a bunch of MAJOR alarms pop up. But, it is okay to acknowledge them in this scenario if they are not already acknowledged for you. Operators can confirm before acknowledging the alarms from the on-call PMT expert in case of doubt.
When PMT HV supply is ON and channels are holding non-zero voltage values, consider all alarms real and act accordingly. All Major alarms need to be communicated to the on-call PMT HV expert.
MONITORING INSTRUCTIONS FOR THE REST OF THE SUB-SYSTEMS:¶
For all the sub-systems below, follow the regular procedure to report alarms:
MINOR alarms: e-log and keep an eye
MAJOR alarms: e-log and alert the corresponding expert with relevant information. On-call experts for each sub-system is listed below.
1. RackFans, RackTemps, RackProt:
Each rack (all racks on the platform and in the DAQ room) contains temperature probes (2 or 3 probes) that measure the air temperature in the rack at various locations. For racks with TPC crates (for example, TPC racks), there is a over-temperature trip associated in case the temperature near the power supply gets too warm.
There are also 1 to 2 fan packs (each fan pack contains 6 fans) in some of the racks to make sure the temperature in the racks doesn't get too warm. It is important that these fans run at nominal speeds to ensure electronics won't get too warm.
Every MicroBooNE rack houses a rack protection system that will turn off the AC power to the rack in case of unsafe situations. It is important to make sure that the rack protection system is green.
See an example rack display below:
One can reach this display, by clicking on any rack icon from the Main Overview display panel.
For major/minor alarms associated with RackFans and RackTemps, contact the slow-monitoring expert.
Only for major alarms associated with RackProtection, contact the Run Coordinator immediately (along with Slow monitoring expert). When the AC power to a rack turns off, a lot of things in the rack gets affected. It wouldn't hurt to keep other experts in the loop while e-logging.
Note: RackFans is one of the sub-systems that changes the alarm status rapidly sometimes, ignore such glitches and only follow-up on "steady" alarms.
This is the impedance monitor for the detector ground. It should be green (or yellow sometimes) all the time. If it turns RED, alert the Run coordinator.
For this sub-systems, the expert to contact is the DAQ on-call expert (along with the slow-controls expert)
PC Status is the hardware monitoring of the DAQ computers.
3. Ar Purity:
You can navigate to the Ar Purity panel from the main overview panel (overview.opi) by clicking on the purple icon (on the bottom left) that says "Ar Purity panel". you will notice from this panel that MicroBooNE has 3 purity monitors: 2 inside the cryostat and one inline. The inline one (purity monitor indexed #3 is currently not exercised, so it will always have the magenta (disconnected) status. The two purity monitors that we monitor (short one indexed #0 and long one indexed #1) are have all valid alarm ranges set, so any alarm you see for these two purity monitors are real and should be treated as such.
One important variable to keep an eye on is the uB_ArPurity_PM01_0/AGE variable which tells you how long it has been between acquiring the current value compared to the previous value.
Any questions related to purity monitors and major alarms should be directed to the on-call purity monitor expert.
MAKING SHIFT PLOTS¶
See here for MAKING SHIFT PLOTS