Project

General

Profile

How to Interpret the Nearline Frontage Plots Near Detector » History » Version 20

« Previous - Version 20/48 (diff) - Next » - Current version
Louise Suter, 12/08/2015 03:40 PM


How to Interpret the Nearline Frontage Plots Near Detector

This page is designed to explain to you what the nearline front page plots should look like and cover the main failure modes that you are likely to see in each plot and how to recover from them. Issues with this page email both and .

General Information

  • First note the timestamp on the bottom left corner of the plot. This should be within the last 15 minutes. If this is not the case, then this indicates that the web-plot making scripts aren't running.
  • Note that Near Detector subruns are ONE HOUR LONG. This plot will only update when a subrun is finished, so it will take ~1.5 hrs to update.
  • The new data goes at the the right edge of the plot. If there is lots of white space on the right side of the plot and the detector is on and a run going, then this is an indication that the nearline processing has stopped.
  • If case of either of these issues email .

Near Detector Nearline Front Page Checklist Plots

The following are the plots used for the Near Detector nearline checklist. For normal running these plots should be show behavior which is constant over time, any deviation from that could indicate an issue.

The four plots are Number of Active FEBs per Subrun, the Timing Peak, the Good Subruns and OnMon FEB Hit Rate Spectrum vs. Time. Click on the links to learn more about each one.

Below describes the plots and what most failures mean. If you can not find the solution here call your Data Quality expert.



Number of Active FEBs per Subrun

This plot shows the number of FEBs (or APDs) that report any hits in a subrun.

It should look like

This are GOOD examples.

The x-axis will auto zoom out so if you were running with some of the detector missing over the last day the x-axis range could be much larger. The total number of channels for all the Near Detector is 631 and we generally have a couple channels which are non reporting at all.

Over the cause of a subrun we expect that a few (0-2) FEBs will stop reporting (drop out) due to them being too noisy. Once EVERY 10 MINS a recovery message is sent to all channels which will recover this dropped out channels.
If you see number of channels decrease over several subruns then AutoStartDAQ might not be running. Follow the instructions on the online manual on how to recover them.

Rarely things can cause a large number of channels to fall out in one go (for example lighting, cell phones, lights).
If you see a large drop (greater than 2) channels in one subrun then you can manually recover these channels by issuing an "Enable FEB Data Flow" (green button) from the TDU Control Interface.

This plot (and all the nearline plots) will take about 1.5 hrs to update. Therefore look in OnMon and the EventDisplay for the missing channels to come back.

If you see a whole DCM not reporting in OnMon or the Event Display you may need to issue a “SYNC” (red button) from the TDU Control Interface.”

If you have issued both a "Enable FEB Data Flow" and a “SYNC” because many channels can be seen missing in the event display/OnMon and after 5 mins still no improvement is seen in the Event Display or OnMon (remember nearline is more than an hour behind) your DAQ on call expert!

Note: If there is maintenance being done on the detector and not all DCMs are in the run this plot will show less than the 631 channels! Wait until a full detector run is started and check again this plot.

Example plots of AutoStartDAQ not running and many missing channels

This is BAD data



Good Subruns

This plot runs many data quality checks over the data and shows different data quality failures in different colors.
For this plot GOOD DATA should look like

This is GOOD data

The plot should white and with a flat rate. The first hours worth of data will show up as gray as we wait for reconstruction to be run over the data. As the reconstruction takes a while to run, a preliminary state of good or bad is shown based on low level quantities which shows up in a lighter shade of the same colour.
NOTE : GOOD data could be reclassified as bad with the extra reconstruction information but BAD data will never be reclassified as GOOD.

IMPORTANT: This plot is made using the beam data and therefore will be blank when there is no beam and will show a lower rate when the beam intensity is lower.
If we are running in a abnormal beam configuration, i.e horn off , of-target or low intensity running, these plots will reflect that and show the data to be bad.

Failure modes are

Red: Failed Timing Peak
The timing peak is not in the right location. Phone your DAQ on-call expert straight away.
Look at the nearline timing peak plot and the OnMon timing plots. Click on the TQPlots folder and look at the TPlotALL and TPlotZoom of the Near Detector timing peak. White/red lines incicate where the timing peak should be located. If the timing peak is not visible (and there is beam) or is shifted compared to the white/red lines call your DAQ on-call expert.

GREEN: Failed DiBlock
Some part of the detector is either missing or has the wrong rate. Are channels also missing in the ‘Number of Active FEBs per Subrun’ plot? If so follow the instructions under that section.
If not follow the tests below in ORDER!
  1. Look at the detector configuration plot http://nusoft.fnal.gov/nova/datacheck/nearline/plots/FarDet-t02-P1GoodDataSelDetConfigDay.png to determine what region of the detector the problem is in.
  2. Are any DMCs warm? Look on the CSS GUI (APD temperature monitor on DAQ-CR-02) overview page. Are any of the DCMs (boxes) dark green instead of light green? For all normal running all of the detector should be cooled (LIGHT green). If they are not cooled either cool that DCM using the ‘Configure cold APD’ button or call an APD cooling expert straight away.
  3. Look at OnMon FEBHitRate and FEBHitRateMap plots (in the shifter folder). Do any DCMs have a high, low or zero rate? Are do any DCMs in the event display not have hits in them? If so try issuing a “SYNC” from the TDU Control Interface. If this has fixed the issue so you see the effect pretty much straight away in OnMon/Event display but remember that Nearline takes ~1.5 hr to update.
  4. Look at OnMon FEBHitRateMapMipADC plot (in the RatePlot ) folder. Do any DCMs have a high, low or zero? If so a DCM might be running at the wrong again. Call Leon Mualem.
  5. If non of the above call your Data Quality on-call expert.

Light Purple: Failed Empty Spill

If more than 30% of all beam spills have no events in them. This could be ok if we are running at a very low intensity beam, check on the POT beam plot in the nearline for what the intensity is.
Often this is seen for just a subrun when beam is only up for part of the hour or the beam intensity drops for some period and then it is ok.
Check the nearline beam quality plots to see if this is the case. If so make a comment in the nearline form put no other action is needed.

If we are not running at low intensity and beam is up this means we are seeing less events that we should. This could be an issue with the thresholds or gains. Call a data quality expert.

BLUE: Hit Rate
This implies that the median hit rate in the MIP region was too high/low in the detector.

If can be caused if we are running in a odd beam configuration like without a horn or target. Check the nearline beam quality plots to see if we are. If we are make a comment in the nearline form put no other action is needed.

Are any DCMs out of sync?
  • Look at OnMon FEBHitRate and FEBHitRateMap plots (in the shifter folder). Do any DCMs have a high, low or zero rate? Or do you see any DCMs on the event display that are not getting hits? Or many short tracks that end on DCM boundaries. If so try issuing a “SYNC” from the TDUControlInterface. If this has fixed the issue, you see the effect pretty much straight away in OnMon/Event display but remember that nearline will take 1.5 hrs to update.
This could be indication that the trigger rates are off.
  • Check if the Average Trigger and Spill Rates look ok, if they do not call your DAQ on-call expert.
  • Check the Trigger scalers on DAQ-CR-05 are any trigger rates in alarm?
  • We expect the cosmic trigger = 1Hz, I Hz accelerator trigger = 1Hz, NuMI = 0.7Hz,
    DDT Activity 1 ~34Hz, DDT Cal Mu ~ 5Hz, All Triggers 42 Hz. If they are much higher or lower call your DAQ on-call expert.
This could indication that we are running at the wrong gain.
  • Look at OnMon FEBHitRateMapMipADC plot (in the RatePlot ) folder. Do any DCMs have a high, low or zero? If so a DCM might be running at the wrong again. Call Leon Mualem.

PURPLE: NuMI Live Time
Subruns turn purple if whole subrun had less than 1000 NuMI triggers worth of live time.

You may see this for isolated subruns at the start or end of an run. If you see this for longer periods call you DAQ on-call expert and email as it may also be an issue with file processing.

ORANGE: Reconstruction/Slice rate
There were too many 2D tracks and/or there were too many/few slices per trigger

Are any DCMs out of sync?
  • Look at OnMon FEBHitRate and FEBHitRateMap plots (in the shifter folder). Do any DCMs have a high, low or zero rate? Or do you see any DCMs on the event display that are not getting hits or do you see many short tracks ending at DCM boundries? If so try issuing a “SYNC” from the TDUControlInterface. If this has fixed the issue so you see the effect pretty much straight away in OnMon/Event display and after ~10 mins in the Nearline.
This could be indication that the trigger rates are off.
  • Check if the Average Trigger and Spill Rates look ok, if they do not call your DAQ on-call expert.
  • Check the Trigger scalers on DAQ-CR-05 are any trigger rates in alarm?
  • We expect the cosmic trigger = 1Hz, I Hz accelerator trigger = 1Hz, NuMI = 0.7Hz,
    DDT Activity 1 ~34Hz, DDT Cal Mu ~ 5Hz, All Triggers 42 Hz. If they are much higher or lower call your DAQ on-call expert. If they are much higher or lower call your DAQ on-call expert.

This is POT weighted so if we are running without a horn or off target then expect this to be in alarm.

This could also be an issue with file processing email .

Black: Other
There were bad timestamps in the subrun.

This is an issue with file processing or corrupt files email .



OnMon FEB Hit Rate Spectrum vs. Time

This plot show the rate of hits in the detector per subrun. The rate should remain constant unless the detector configuration changes or the beam intensity changes.
Below are examples both with the beam off and beam on and you can see how much expect the rate to change.

This is GOOD BEAM ON data

This is GOOD BEAM OFF data

Sudden drops or increases in rate which do not coincide with a change in beam conditions or that are of a high magnitude indicate an issue.
For the last and first subrun in the run low statistics could result in drops for that one subrun.

  1. Are channels missing from the detector? Check the Number of Active FEBs per Subrun plot and follow the instructions there.
  2. Is the detector out of sync? In the Event Display can you see many short tracks ending on DCM boundaries? Is there a DCM with no hits? Look at OnMon FEBHitRate and FEBHitRateMap plots (in the shifter folder). Do any DCMs have a high, low or zero? If any of the above try issuing a “SYNC” from the TDUControlInterface. If this has fixed the issue so you see the effect pretty much straight away in OnMon, remember ND nearline will take ~1.5 hrs to update.
  3. Are any DCMs warm? Look on the CSS GUI (APD temperature monitor) overview page are any of the DCMs (boxes) dark green instead of light green? For all normally running all of the detector should be cool (LIGHT green). If they are not cool the detector (using ‘configure cold APDs’ button) or call an APD cooling expert straight away.
  4. Look at OnMon FEBHitRateMapMipADC plot (in the RatePlot ) folder. Does any DCMs have a high, low or zero? If so a DCM might be running at the wrong again. Call Leon Mualem.

If all the above does not fix the the issue as seen in Event Display or OnMon mins call your DAQ on-call expert, remember ND nearline will take ~1.5 hrs to update.

These are examples of BAD data.

This is BAD data.



Timing Peak

This plot show the location of the timing peak over time.

For good BEAM ON data it should look like.
This is GOOD data

This plot is made using NuMI triggered files it will be empty if there is no beam. This plot is for 6+2 slip stacking when we start doing 6+4 we will see the first 4 out of 6 bunches will have a higher intensity.

If the timing peak has shifted. Phone your DAQ on-call expert straight away.
Look at the nearline timing peak plot and the OnMon timing plots. In OnMon click on the TQPlots folder and look at the TPlotALL and TPlotZoom of the Near Detector timing peak. White/red lines incicate where the timing peak should be located. If the timing peak is not visible (and there is beam) or is shifted compared to the white/red lines call your DAQ on-call expert.