Project

General

Profile

How to Interpret the Nearline Frontage Plots Far Detector » History » Version 7

« Previous - Version 7/38 (diff) - Next » - Current version
Louise Suter, 02/25/2016 12:18 PM


How to Interpret the Nearline Frontage Plots Far Detector

This page is designed to explain to you what the nearline front page plots should look like and cover the main failure modes that you are likely to see in each plot and how to recover from them. Issues with this page email both and .

General Information

  • First note the timestamp on the bottom left corner of the plot. This should be within the last 10 minutes. If this is not the case, then this indicates that the web-plot making scripts aren't running.
  • Next that the new data goes at the the right edge of the plot. There might be a small (20 minutes) worth of white space between the most recent data and the right edge of the plot as it takes a while to update the plots. If there is lots of white space on the right side of the plot and the detector is on and a run going, then this is an indication that the nearline processing has stopped.
  • If case of either of these issues email .

There is a known failure mode at 8.30am every day caused by the SNEWS trigger test, shows up in plots as blue hit rate failure.

Far Detector Nearline Front Page Checklist Plots

The following are the plots used for the Far Detector nearline checklist. For normal running these plots should be show behavior which is constant over time, any deviation from that could indicate an issue.

The four plots are Number-of-Active-FEBs-per-Subrun, the Average Trigger and Spill Rates, the Good Subruns and OnMon FEB Hit Rate Spectrum vs. Time. Click on the links to learn more about each one.

Below describes the plots and what most failures mean. If you can not find the solution here call your Data Quality expert.



Number of Active FEBs per Subrun

This plot shows the number of FEBs (or APDs) that report any hits in a subrun.

It should look like

This are GOOD examples.

The x-axis will auto zoom out so if you were running with some of the detector missing over the last day it may look more like
This is a GOOD example.

Over the cause of a subrun we expect that a few (0-2) FEBs will stop reporting (drop out) due to them being too noisy. Once an HOUR a recovery message is sent to all channels which will recover this dropped out channels.
If once an hour you do not see the number of channels increase then AutoStartDAQ might not be running. Follow the instructions on the online manual on how to recover them.

Rarely things can cause a large number of channels to fall out in one go (for example lighting, cell phones, lights).
If you see a large drop (greater than 5) channels in one subrun then you can manually recover these channels by issuing an "Enable FEB Data Flow" (green button) from the TDU Control Interface.

The total number of channels for all the Far Detector is 10752, we generally have some channels which are non reporting at all. We expect about 10745 channels on the Far Detector to be reporting for normal running which will drop down to about 10730 over the course of an hour. (This is Sept 28th 2015 this number of channels will fluctuate somewhat but should be around these values).

This plot knows how many FEBs that should be in the run and if the number falls significantly below that you will see that background automatically will turn RED with white data points and a warning message.
If you see that the plot is RED do the following.
  • Make sure AutoStartDAQ is on (check for green indicator on TDU Control Interface) and turn on if not follow instructions in How-to guide in Online Shift Mauel to turn it on.
  • Issue a "Enable FEB Data Flow" (green button) from the TDU Control Interface.

This plot (and all the nearline plots) will take about 10mins to update.
If after 10mins this plot still shows that many channels have dropped out, issue a “SYNC” (red button) from the TDU Control Interface.”

If you have issued both a "Enable FEB Data Flow" and a “SYNC” and after 10 mins still no improvement is seen call your DAQ on call expert!

Note: If there is maintenance being done on the detector and not all DCMs are in the run this plot will be red! Wait until a full detector run is started and check again this plot.

Example plots of AutoStartDAQ not running and Example of Red plot for many missing channels

This is BAD data



Average Trigger and Spill Rates

This plot show the rate of the cosmic and NuMI triggers per subrun over the last day. This is very important. Even if the run is going if the NuMI trigger is not firing than we will not record any neutrinos!

--------- This is good BEAM ON data ------------------------------ This is good BEAM OFF data

This is GOOD data.

The BLUE line in this plot shows the cosmic trigger. This is a periodic trigger that ALWAYS has a rate of 10Hz. You should only ever see an interruption to this if the detector is not taking data. If this rate is not 10Hz call your DAQ expert.

The two RED lines are the NuMI trigger. We expect this to run at ~0.7Hz if the beam is up. The bright red cross show the rate the accelerator division is DELIVERING beam to US, the Delivered Beam Spill Rate.
The thick darker red line is the NuMI trigger WE ARE RECORDING. These two lines should always match up unless the detector is off. If they do not call your DAQ expert telling them that we are not recording NuMI events.

If the bright red crosses are there and the thick darker red line is not we might be loosing data!. Confirm that the detector is taking data and if it is call your DAQ expert telling them that we are not recording NuMI events.

If you see gaps in Delivered Beam Spill Rate this is only expected if beam database is down or there is no beam. If you think the beam is up and this line is not there email .



Good Subruns

This plot runs many data quality checks over the data and shows different data quality failures in different colors.
For this plot GOOD DATA should look like
This is GOOD data
The plot should white and with a flat rate. The first hours worth of data will show up as gray as we wait for reconstruction to be run over the data. As the reconstruction take a while to run a preliminary state of good or bad is done based on low level quantities which shows up in a lighter shade of the same colour.
NOTE : GOOD data could be reclassified as bad with the extra reconstruction information but BAD data will never be reclassified as GOOD.

Failure modes are

RED: Partial Detector or GREEN: Failed DiBlock
Some part of the detector is either missing or has the wrong rate. Are channels also missing in the ‘Number of Active FEBs per Subrun’ plot? If so follow the instructions under that section.
If not follow the tests below in ORDER!
  1. Look at the detector configuration plot http://nusoft.fnal.gov/nova/datacheck/nearline/plots/FarDet-t02-P1GoodDataSelDetConfigDay.png to determine what region of the detector the problem is in.
  2. Are any DMCs warm? Look on the CSS GUI (APD temperature monitor on DAQ-CR-02) overview page. Are any of the DCMs (boxes) dark green instead of light green? For all normal running all of the detector should be cooled (LIGHT green). If they are not cooled either cool that DCM using the ‘Configure cold APD’ button or call an APD cooling expert straight away.
  3. Look at OnMon FEBHitRate and FEBHitRateMap plots (in the shifter folder). Do any DCMs have a high, low or zero rate? Are do any DCMs in the event display not have hits in them? If so try issuing a “SYNC” from the TDU Control Interface. If this has fixed the issue so you see the effect pretty much straight away in OnMon and after ~10 mins in the Nearline.
  4. Look at OnMon FEBHitRateMapMipADC plot (in the RatePlot ) folder. Do any DCMs have a high, low or zero? If so a DCM might be running at the wrong again. Call Leon Mualem.
  5. If non of the above call your Data Quality on-call expert.

BLUE: Hit Rate
This implies that the median hit rate in the MIP region was too high/low in the detector.
There is a known failure mode at 8.30am every day caused by the SNEWS trigger test.
If this is seen just a start or end of run for only 1 subrun it might just be statistics.

Are any DCMs out of sync?
  • Look at OnMon FEBHitRate and FEBHitRateMap plots (in the shifter folder). Do any DCMs have a high, low or zero rate? Or do you see any DCMs on the event display that are not getting hits? If so try issuing a “SYNC” from the TDUControlInterface. If this has fixed the issue so you see the effect pretty much straight away in OnMon/Event display and after ~10 mins in the Nearline.
This could be indication that the trigger rates are off.
  • Check if the Average Trigger and Spill Rates look ok, if they do not call your DAQ on-call expert.
  • Check the Trigger scalers on DAQ-CR-01 are any trigger rates in alarm?
  • We expect the cosmic trigger = 10Hz, I Hz accelerator trigger = 1Hz, NuMI = 0.7Hz, DDT FastMono ~25Hz, DDContained ~ 8Hz, DDslow ~1 Hz, DDnumu ~ 8Hz, DDupmu ~1Hz, DDcalmu ~ 26Hz , Energy ~ 7Hz. If they are much higher or lower call your DAQ on-call expert.
This could indication that we are running at the wrong gain.
  • Look at OnMon FEBHitRateMapMipADC plot (in the RatePlot ) folder. Do any DCMs have a high, low or zero? If so a DCM might be running at the wrong again. Call Leon Mualem.

PURPLE: Live time
Subruns turn purple if whole subrun had less than 1000 triggers worth of live time.

You may see this for isolated subruns at the start or end of an run. If you see this for longer periods call you DAQ on-call expert and email as it may also be an issue with file processing.

ORANGE: Reconstruction
There were too many 2D tracks and/or there were too many/few slices per trigger

Are any DCMs out of sync?
  • Look at OnMon FEBHitRate and FEBHitRateMap plots (in the shifter folder). Do any DCMs have a high, low or zero rate? Or do you see any DCMs on the event display that are not getting hits or do you see many short tracks ending at DCM boundries? If so try issuing a “SYNC” from the TDUControlInterface. If this has fixed the issue so you see the effect pretty much straight away in OnMon/Event display and after ~10 mins in the Nearline.
This could be indication that the trigger rates are off.
  • Check if the Average Trigger and Spill Rates look ok, if they do not call your DAQ on-call expert.
  • Check the Trigger scalers on DAQ-CR-01 are any trigger rates in alarm?
  • We expect the cosmic trigger = 10Hz, I Hz accelerator trigger = 1Hz, NuMI = 0.7Hz, DDT FastMono ~25Hz, DDContained ~ 8Hz, DDslow ~1 Hz, DDnumu ~ 8Hz, DDupmu ~1Hz, DDcalmu ~ 26Hz , Energy ~ 7Hz. If they are much higher or lower call your DAQ on-call expert.

This could also be an issue with file processing email .

Black: Other
The subrun had no activity or there were bad timestamps in the subrun.

Could be indication that we are recording empty events.
Are any DCMs out of sync?
  • Look at OnMon FEBHitRate and FEBHitRateMap plots (in the shifter folder). Do any DCMs have a high, low or zero rate? Or do you see any DCMs on the event display that are not getting hits or do you see many short tracks ending at DCM boundries? If so try issuing a “SYNC” from the TDUControlInterface. If this has fixed the issue so you see the effect pretty much straight away in OnMon/Event display and after ~10 mins in the Nearline.

This could also be an issue with file processing email .



OnMon FEB Hit Rate Spectrum vs. Time

This plot show the rate of hits in the detector per subruns. The rate should remain constant unless the detector configuration changes.

This is GOOD data

Sudden drops or increases in rate indicate an issue.
For the last and first subrun in the run low statistics could result in drops for that one subrun.

For the first 5 mins or 1-2 subruns of a new run a low rate is often seen due to DAQ issues. If this is seen to continue after 5 mins, issue a "SYNC" from the TDUControlInterface. If after 10 mins from the SYNC and the issue is seen call the DAQ on call expert.

  1. Are channels missing from the detector? Check the Number of Active FEBs per Subrun plot and follow the instructions there.
  2. Is the detector out of sync? In the Event Display can you see many short tracks ending on DCM boundaries? Is there a DCM with no hits? Look at OnMon FEBHitRate and FEBHitRateMap plots (in the shifter folder). Do any DCMs have a high, low or zero? If any of the above try issuing a “SYNC” from the TDUControlInterface. If this has fixed the issue so you see the effect pretty much straight away in OnMon.
  3. Are any DCMs warm? Look on the CSS GUI (APD temperature monitor) overview page are any of the DCMs (boxes) dark green instead of light green? For all normally running all of the detector should be cool (LIGHT green). If they are not cool the detector (using ‘configure cold APDs’ button) or call an APD cooling expert straight away.
  4. Look at OnMon FEBHitRateMapMipADC plot (in the RatePlot ) folder. Does any DCMs have a high, low or zero? If so a DCM might be running at the wrong again. Call Leon Mualem.

If all the above does not fix the the issue after 10 mins call your DAQ on-call expert.

These are examples of BAD data.

This is BAD data.