OnMon Shifter Wiki¶
This is the general "how to" wiki for shifters.
- Table of contents
- OnMon Shifter Wiki
- OnMon viewer layout
- How to interpret the layout of OnMon HitMaps
- How to check the general status of the detector
- How to identify specific hardware in trouble
- How to generate comparison plots
- How to add/remove items from the WatchList
- A general note on the differences between Errors and Alerts
- General trouble-shooting
OnMon viewer layout¶
The OnMon viewer looks something like this:
Displayed at the top of the window is "OnMon Viewer." If the viewer is running from a shared memory segment (as opposed to a file) then the shared memory segment is also displayed. For example "OnMon Viewer - FD01.shm." In this example you can tell from "FD01" that this viewer is looking at FD data from partition 01 (FD = FarDet, ND = NearDet, NS = NDOS.)
The labeled sections are:
1. Main histogram canvas: This is a standard root canvas (so you can zoom or set logy etc.)
2. Histogram browser: Histograms are organized into folders here. Click on a folder to open it and click on a histogram to view it.
3. Histogram information box: Extra information about the current histogram is displayed here, such as how it is binned, how often it refreshes, and a 3-4 sentence description about what type of information it is showing you.
4. Message box: Messages, warnings, and errors from the viewer are displayed here.
5. Single-click information box: Single-clicking on a bin in a 1D histogram will display the bin contents. In a 2D histogram, it will display the hardware coordinates (Diblock, block, DCM, FEB, pixel, plane in block, module, cell) that correspond to that bin as well as the bin contents.6. Button bar:
- back/forward - history navigation buttons (functions just like an internet browser)
- play/pause - starts/stops the canvas from refreshing the current histogram every 30 seconds
- refresh - force the current histogram to refresh (this will also reset the canvas with all of the default drawing options for the histogram, including unzooming)
- 1D bin contents - create a 1D histogram of bin contents from the current histogram
- print - prints the current canvas to [histoname].png in the current working directory
7. Producer status bar: Displays current information from the producer (updated every 2 seconds.) This can be thought of as the "heartbeat" of the producer.
8. Special features tabs: This will take you to the comparison and watchlist tabs (see below for how to use them.)
How to interpret the layout of OnMon HitMaps¶
The OnMon plots are typically drawn in the "DAQ coordinates." This means that in contrast to the Event Display, the OnMon plots reflect what would be seen by looking at actual pieces of hardware. When standing on the catwalk, if you are facing the detector then you are facing East which means that the NuMI beam is entering the detector FROM THE RIGHT. All of the hardware is shown in the same plot, so you are seeing the top and side of the detector at the same time. Below is an example of the hardware layout for the far detector.
Above the red line is the hardware on the top of the detector (accessible from the MAP) and below the red line is the hardware on the side of the detector (accessible from the catwalks.) Imagine that this plot was printed on a GIANT piece of paper. If you folded that paper along the red line, then you could lay it on the FD so that the top half of the paper would cover the top of the detector and the bottom half of the paper would cover the West side.
In this view, planes are numbered from right to left. For vertical planes, the cells are numbered from the top of the plot to the center red line. For horizontal planes, the cells are numbered from the bottom of the plot to the center red line. The thicker solid lines outline DCMs. The thinner solid lines outline FEBs.
When you double click on any of the DCMs in the above plot, you will see the hit map for that individual DCM (such as the one shown below.) In this view, the thicker solid lines outline FEBs and the thinner solid lines outline individual pixels.
The FEBs are labeled in rows starting from 00 in the top left corner and ending with 63 in the bottom right corner. This mirrors what you would see if you picked up a DCM and looked at the ports on the front panel where the FEBs plugged in. Within an individual FEB (each 8x4 rectangle) are the pixels for that FEB. These pixels are numbered to mirror what you would see if you took the FEB off of the top of a snout and looked at the ends of the fibers (that is, what the APD would see.) Note that the pixels are numbered in a bizarre way that does not adhere to a simple pattern! The pixel numbering scheme for an individual FEB is drawn on the plot in the bottom left hand corner.
For any of the HitMap style OnMon plots like the two shown above (displaying the whole detector or displaying an individual DCM) you can single-click in any bin and the DAQ coordinates and plane/module/cell will be displayed in the information box at the top of the viewer.
How to check the general status of the detector¶
The histograms that give you the big picture of the general state of the detector are organized into the folder titled "Shift." Each of these histograms also exists in one of the other folders (for example, FEBHitMap also exists in the "HitMaps" folder) they are merely duplicated in the "Shift" folder for convenience. Open up the "Shift" folder in the histogram browser to display the following histograms:
- FEBRatesVsHourDB - In this folder, you will find one plot for each diblock containing the hit rate for each FEB in that diblock as a function of UTC Hour. The plot is auto zoomed on the Z axis so that (roughly) the correct hit rate for an FEB will be in the middle of the plot. If all is well, then this plot should be almost completely green or greenish yellow. Red or blue stripes indicate FEBs with hit rates outside of the standard range.
- AAVsHour - "All Alerts vs. Hour": This plot displays the number of alerts being reported as a function of UTC time. It always displays the most recent hour of data with "now" being the right most edge of the plot. If all is well, this should be blank. To see more than one hour, pause the viewer (don't forget to un-pause when you are finished) and unzoom the X-axis. For each alert, there is a corresponding hitmap in the "Alerts" folder which you can look at to determine exactly which hardware is in an alert state. See Descriptions of Errors/Alerts in OnMon for more details.
- AEVsHour - "All Errors vs. Hour": This is the same as AAVsHour except of course that it is reporting errors instead of alerts. If all is well, this should be blank.
- AveNmicroSlVsHour - "Average Number of Micro-slices vs. Hour": This plot shows the total number of micro-slices in an event divided by the number of active DCMs for the event. Like all plots that end in "VsHour" it displays the most recent hour of data and can be unzoomed to show more. If all is well, each active DCM should be reporting 11 micro-slices per event so this should be a nice steady line centered on 11.0.
- DCMRatesVsHour - "Hit Rates by DCM Vs. Hour": This plot is just like the FEBRatesVsHourDB plots, but it done by DCM and shows all DCMs in the whole detector. It should also be mostly green with red or blue stripes indicating DCMs with non-standard hit rates.
- EmptyDCMsVsHour - This plot shows the number of times each DCM in each diblock does NOT report as a function of time. If all is well, then each DCM not currently in the run should show up as a red stripe with everything else being a few purple dots sprinkled on a background of white. If a DCM (one row in this plot) has more purple (or other colored) dots than the other DCMs, then this is a sign of a DCM in trouble.
- FEBHitMap - This plot is shown for the whole detector and shows how many times each FEB has reported a hit. Double-click on a DCM to see the hit map broken down by pixel for that specific DCM. There WILL be hot and cold channels in the detector, so this plot will typically be mostly green with a few red and maybe a few blue FEBs sprinkled here and there.
- FEBHitRateMap - This plot is shown for the whole detector and shows the FEB hit rate (by live time.) Double-click on a DCM to see the hit map broken down by pixel for that specific DCM. There WILL be hot and cold channels in the detector, so this plot will typically be mostly green with a few red and maybe a few blue FEBs sprinkled here and there.
- NdcmsVsHour - "Number of DCMs vs. Hour": Shows the number of DCMs reporting information per event vs. UTC time. The probability that no noise hits will be reported in a DCM for one event is extremely small so, if all is well this should be a steady number and should agree with the number of DCMs you see reporting information in the FEBHitMap. If you see less than the number of DCMs expected, then you can look to CountPlots/UTCPlots/EmptyDCMsVsHour to figure out exactly which DCMs arent reporting.
- NfebsVsHour - "Number of FEBs vs. Hour": Shows the number of FEBs reporting information per event vs. UTC time. This number will fluctuate up and down but should remain steady around one central value. If all is well, this plot will have some width to it and should should be steady over periods of time longer than 15 minutes.
- NhitVsHour - "Number of Hits vs. Hour": Just like the previous plot, this shows the number of hits reported per event vs. UTC time. This number will fluctuate up and down but should remain steady around one central value. If all is well, this plot will have some width to it and should be steady over periods of time longer than 15 minutes.
- TotNmicroSlVsHour - "Total Number of Micro-slices vs. Hour": This is a good plot to watch for general DCM health. It is just like the average number of micro-slices plot except that it shows the total number. If all is well, this should be a steady line at 11*(number of DCMs in the run).
- TriggerVsHour - "Number of Triggers Vs. Hour": This plot will show you the number of triggers seen per minute broken down by trigger type. You will see stripes of filled bins for all triggers that are currently active. Drastic changes in color indicate major changes in trigger rates. You can also look at the plot Triggers/TriggerVsHourGeneral for the same plot made with a dramatically expanded Y-axis to include all possible trigger types.
How to identify specific hardware in trouble¶
OnMon can easily identify specific hardware that is in trouble or at least it can identify hardware that is acting differently from everything else. This is probably best explained through a couple of examples in which case you will be taking advantage of the following general OnMon features:
- double-click drill-down - For any 2D histogram displayed in the full detector hardware view (the view with diblock labels across the top and DCM labels along the side) you can double-click on a DCM to display a histogram for that specific DCM. You can then use the "back" button on the button bar to return to the previous histogram. You can also navigate directly to the DCM specific histograms from the appropriate folder in the histogram browser.
- single-click - For any 2D histogram displayed in either the hardware or plane/cell coordinate system, you can single-click inside of a histogram bin to display the coordinates (diblock, DCM, FEB, pixel, plane, and cell) as well as the bin content for that bin.
Example 1: Identifying an Alert / Error¶
Let's say that you are enjoying a relaxing shift so far when by casually flipping through the plots in the "Shift" folder, you notice that the All Alerts plot is no longer blank (shown below.)
The above plot is showing that somewhere in the detector, one or more FEBs has had its "Buffer Empty" alert flag flipped from good to bad. So you spring into action, but what do you do? Open the "Alert" folder in the histogram browser and select the hitmap that corresponds to FEB Buffer Empty ("FEBEbyPixelDCM" - there is one for each alert.) This hitmap will show you which DCM has the FEB that had its buffer empty alert flag tripped (shown below.)
From the above hitmap, you can clearly see the DCM that has the bad FEB. So double-click on that DCM to drill-down into it to find the FEB (shown below.)
Now you can see exactly which FEB has been reporting the problem. You can figure out by using the labels on the side which FEB it is OR you can single-click on the FEB and see the hardware coordinates displayed in the single-click information box. Report it to whomever and/or record it in the log book and then keep your eye on it. Depending on the alert/error, you may want to add that FEB to the watch list (see below for how to do this.)
Example 2: Determining what kind of information a hot pixel is reporting¶
Let's say that you notice a particular pixel is "hot" (i.e. - it is reporting far more information than most of the other pixels) and you want to know if it is reporting high or low ADC and if it is reporting early or late in the trigger window. From the FEBHitMap, you observe that there is a hot FEB in DCM-02-01 so you double-click on that DCM to get the hitmap for that specific DCM (show below.)
So to determine if this pixel is reporting high or low ADC (or maybe both - yikes...) go to the "HitMaps" folder in the histogram browser and open the "HighADCPixelsDCM" and "LowADCPixelsDCM" folders. In the HighADC folder, click on the hitmap for DCM-02-01 (shown below.)
For the above hitmap, you can see that this pixel is reporting a normal amount of high ADC hits with respect to its neighbors. So in the LowADC folder, select the hitmap for DCM-02-01 (shown below.)
For the above hitmap, you can clearly see that this pixel is "hot" in terms of low ADC hits. So it is constantly talking but it is NOT yelling, it is whispering. To determine if it is reporting its information early or late, you will need a hit time plot for this pixel. This plot is not automatically generated, you will have to make it by adding the pixel to the watchlist (see instructions below.) After you let the watchlist histograms fill for a while, you can look at the WLTPlot for this pixel in the "WatchList" folder. This is the hit time with respect to the trigger time so it should be a nice trapezoid ramping up linearly from -50 to 0, flat from 0 to 500, and then ramping down linearly from 500 to 550. How it is distorted from this shape will tell you if it is reporting early in the trigger window or late.
How to generate comparison plots¶
The comparisons tab looks like this:
- After selecting the histogram from the histogram browser for which you would like to generate a comparison plot, choose the source for your reference histogram from the box labeled "Comparison to..." You can choose "Reference" and either type in a file name or click the browse button to find the file you want to open, or you can choose "Recent" and pick one of the recent copies of the current histogram that the OnMon producer keeps. These are the "look back" histograms. Whenever a histogram is reset by the producer, it copies the histogram into a list of look backs for that histogram. For example, if the producer has been running for several hours and the current histogram is reset every 10 minutes, then choosing "look back 1" will compare the current histogram to the one from 10 minutes ago. Choosing "look back 2" will compare it to the one from 20 minutes ago (etc.)
- Next choose how you want the comparison histogram to be calculated by selecting the appropriate option in the "Comparison method" box. Note that one of the options is simply "Show reference histogram."
- Next choose how you want the reference histogram to be normalized. The reference histogram will always be normalized to the current histogram, and choosing "Absolute" will keep the reference histogram bins unchanged.
- Lastly, click the "Apply Options" button at the bottom to generate and display your comparison histogram.
- NOTE: If you have a set of comparison options selected, then when you browse around looking at other histograms, a comparison plot will be generated for each of the histograms that you select. Browsing with comparison options turned on is by design to make a quick comparison check of multiple histograms easier. So, don't forget to set the "Comparison to.." option back to "None" when you wish to go back to looking at histograms in the normal way.
How to add/remove items from the WatchList¶
Almost all available information about a specific pixel can be found by browsing the existing hitmaps. However, some information (hit time and charge) is only accumulated by DCM (thereby losing information about a specific pixel.) Normally you don't need to see this information for a specific pixel or FEB, but sometimes you might want to look at it if the hardware is new or if you suspect it is in trouble. For this, there is the WatchList. When you add a specific piece of hardware to the WatchList, the producer will begin making three histograms for that specific set of hardware: a 1D plot of hit time with respect to trigger time for each hit, a 1D plot of charge for each hit, and a 2D plot of those two quantities together. The WatchList tab looks like this:
To add / remove items do the following:
- Select the hardware you want to add from the drop-down boxes in the "Choose Hardware" box. Note that you can select "ALL" as a wild card.
- Click "ADD" to add your selection to the list. This will send a message to the producer and it will immediately begin to fill the three histograms mentioned above for the selected hardware. You can now return to the histogram browser and open the "WatchList" folder to see these histograms. These new WatchList histograms will continue to be filled until either they are removed from the WatchList (by the user) or the producer stops running.
- To remove items from the WatchList, simply select the hardware you want to remove from the "Current WatchList" box (you can select multiple items) and click "Remove from WatchList." Note that when you remove an item, the producer will immediately stop filling the histograms BUT the histograms won't disappear from the histogram browser until the end of the next subrun. This is done so that they continue to exist until the next time that the producer writes its histograms to disk (thus preserving a permanent record of them.)
A general note on the differences between Errors and Alerts¶
A non-DAQ expert might find themselves confused about just what is meant by an error vs. an alert (I myself am finding it difficult to find the right words to describe the differences between the two.) Here is a brief explanation of the two categories:
- An error is something that is recorded like a hit. For example if an event is "bad" then the meta-data reported for that event might contain the "Data Missing" or "Event Incomplete" error. Errors are recorded like hits in the sense that they might be reported only periodically, so we keep track of how often they are reported.
- Some of the errors are reported at the event or data-block level (not hardware specific) so for those errors we just keep track of the number of times they are reported. These errors include milli-slice Incomplete, Data Missing, and Event Incomplete.
- All of the other errors are reported by specific hardware (DCM or FEB or pixel.) For each of these, there is a hitmap in the "Errors" folder that can show you which hardware is reporting the errors.
- There is one additional hitmap in the "Errors" folder that does not correspond to the errors shown in the All Errors plot. This is the Byte Count plot. It is displayed as a hitmap showing the reported byte count for that DCM. This information is reported by each DCM so if you double-click on a specific DCM, it will give you a 1D histogram of the byte count for that DCM.
- In contrast to an error, an alert is a "status" bit. For each alert, the status bit will stay "0" if everything is fine and flip to "1" indicating that a specific piece of hardware is in a state of alarm. The alert histograms are NOT filled with
histo->SetBinContent(status)so that they will not accumulate the number of times that an alert is reported, but reflect the fact that the hardware is in a state of alarm.
- Like the error plots, for each alert there is a corresponding hitmap in the "Alerts" folder that can be used to identify hardware in trouble.
- Once a piece of hardware has had an alarm status bit set, it will remain in this state until it is reset. This is not something that can be done from OnMon.
- If an alert status bit has been set to "1" and then it is reset, it will stay red in the OnMon hitmaps until the histogram is reset (which happens every 5 minutes.)
The following is a list of some general notes in no particular order that could lead to or a re related to trouble running OnMon:
- The producer MUST be started before the viewer. They communicate through a shared memory segment that the producer sets up, and if this doesn't exist when the viewer starts up, it will time out trying to look for it and quit.
- If you can't get the producer or the viewer to start, first try using the stop/kill script on the CR-02 desktop.
- Don't forget to renew your kerberos ticket! Otherwise neither the producer nor the viewer will start!