Project

General

Profile

Beamline DQM

The beamline DQM runs inside a DAQ process and analyzes raw data in memory as it passes through the DAQ. This allows realtime feedback showing the quality and properties of the data as it is collected, and is critical for shifters to ensure good data are collected and for experts to monitor the systems. It is described in detail on this page.

Overview

When running, the monitoring constantly analyzes the data from each spill and produces summary plots. These are saved in a root file (work in progress) and images are saved to a web server. They are accessible via the web at:
http://nusoft.fnal.gov/nova/datacheck/beamline/

In addition to these plots, certain pieces of information are sent to the terminal which is running the DQM.

During running, one may expect a latency of around 5 seconds following the end of the spill as the data are analyzed, plots made and images saved.

Running the DQM is explained in the next section before the plots which are produced are described in the following section. Technical and expert details are at the end of the page.

Running the DQM

The DQM may be started when the DAQ is running. Note, since it runs inside a DAQ process, this process must exist in order for the art job to run. You cannot start the DQM before the DAQ has booted up.

The DQM is started by running the script

start_monitoring.sh

in /home/nfs/novadaq/bin/ on novabeamlinedaq00.
The script
stop_monitoring.sh

in the same directory stops the monitoring process.

Configurable parameters (mostly for expert use) are contained in the art configuration files

beamline_dqm.fcl

in the bin directory. These parameters may be altered at any point and will take effect whenever the DQM is restarted.

When the DQM is started, all the plots are cleared. This is currently the only way to clear all the plots and remove errors and other old data. Most plots then fill up until the monitoring is stopped, apart from those which are reset every spill.

For convenience for shifters, a desktop icon on VNC4 is provided for both purposes.

Plots Produced

These plots are described with a simple caption on the web but more details are provided here. The next subsections refer to the web pages which contain the plots (which reflects the parallelization of the data).

Some plots have parameters which may be configured by the user at run-time, as mentioned above. In this case the parameters are listed in italics following the description.

Front Page

  • Links to all other pages
  • The front page has the total errors for the beamline systems. This plot is described in the Data Quality section but is also placed here due to its importance.

Data Quality

Contains basic data quality plots, nothing specific to any subsystem.
http://nusoft.fnal.gov/nova/datacheck/beamline/dq/

  • Total Beamline Errors: The most important of the DQM plots. It contains a summary of all errors which are relevant for the beamline data. The individual errors are described below in their individual plots. If any single error is present in three consecutive spills, the run is now considered bad (no data being collected from this point is analyzable) and the plot will turn red. Depending on the error, the issue may then be addressed. Typically, restarting the DAQ will often reset things and resolve the issue.
  • Good Triggers: Shows the number of trigger received during the entirety of the run, separated into 'Good' and 'Bad'. More for expert use. The specific definitions have some expectations of the physics we may do with these data, and are described in the Physics section below.
  • DQ Check; Consistent Data Fragment: When a trigger is formed by the trigger board (regardless of the logic), it is sent to each of the front-end components to initiate a data readout (and the trigger board itself saves its data). This results in individual data fragments recorded by each of the components: trigger board, digitizer, wire chambers, TDU. Each fragment should correspond to one trigger. Offline, we match up data from multiple triggers by taking the fragments in order for each component. Therefore, it is imperative each of the front-end readouts saves the same number of fragments during each spill. If this is not the case, the data from the entire spill is not useful. These bad data are thrown away offline when unpacking. This plot shows the amount of triggers for which this is and is not the case. Note that for various resetting reasons, the first spill of a new DAQ run will often have stale data being read out of buffers and so will very often not align. This is normal as long as it is only the first spill, so it is common to see a single entry here corresponding to that first run (if the monitoring were started before the first spill were taken). These plot entries also goes into the Total Beamline Errors plot and if remains bad for three or more spills, will turn that plot red. This is often due to a readout problem with the wire chamber controller, or the TDU BoardReader dying. Restarting the run will most often fix these problems (if the wire chamber problems persist, it is recommended to power cycle the MWPC Controller).
  • DQ Check; Consistent MWPC Fragment Timestamps and Consistent TDC Fragment Timestamps (2): The format of the TDC data as saved by the MWPC Controller involves a single TDC fragment for each TDC. This fragment has a controller and TDC timestamp saved in each. These two plots check for consistency across all 16 TDCs; if any of the timestamps differ then the controller has lost time synchronization and may not be saving data correctly. These bad data are thrown away offline when unpacking. This is also reported in the Total Beamline Errors plot and if is bad for three consecutive spills will turn the plot red. Restarting the run should fix the problem; if not it is recommended to power cycle the MWPC Controller.
  • DQ Check; MWPC Controller/TDC Sync: The MWPC Controller and TDCs run on different clocks and in order to get the time for a trigger the individual timestamps must be concatenated. 11 bits overlap to provide a check for consistency. If they are not consistent, the data are considered bad and are thrown away offline when unpacking. This is also reported in the Total Beamline Errors plot and if is bad for three consecutive spills will turn the plot red. Restarting the run should fix the problem; if not it is recommended to power cycle the MWPC Controller.

Spill

Contains plots which reset every spill and show a summary of the data collected in the previous beam spill.
http://nusoft.fnal.gov/nova/datacheck/beamline/spill/

  • Number of Data Fragments: Number of fragments collected by each of the front-end components during the spill. See description above. The legend shows the number reported by each, and a single line for each is drawn. In the case of the number of fragments aligning, this shows the number of triggers received in the spill. The gray histogram shows the total number of triggers received in the run since the monitoring was started, by way of comparison. If the spill was bad, this plot will turn red (and an error will be added to the Total Beamline Errors plot described above).
    • Configurable axis range (NumTrig)
  • Beamline Trigger Time: Shows the time of each of the beamline triggers during this spill. The time is shown in seconds since the $00 supercycle reset. The red lines are somewhat representative of where the switchyard slow beam extraction occur on this timeline. Note that this is not fixed and may be altered over time (but the lines may be configured at runtime).
    • Configurable switchyard start and end times represented by the dashed lines (SwydStart and SwydEnd)
  • Trigger Bit Set: The amount of times each of the 16 trigger bits saved by the trigger board is set. The legend on the right-hand side shows a brief description of the channels currently connected to the board, for convenience.
    • Configurable channel description list (TriggerBoardChannels)
  • Reconstructed Digitizer Hits: The number of reconstructed hits on each of the digitizer channels during the spill. The legend on the right-hand side shows a brief description of the channels currently connected to the board, for convenience.
    • Configurable number of digitizer channels to show (DigitChan)
    • Configurable channel description list (DigitizerLabels)
  • TDC Hits: The number of raw hits on each of the TDCs during the spill.
  • Matched TDC Hits: The number of matched raw hits between the wire chamber planes during the spill, for each wire chamber. A matched hit is considered to be when there are hits on each plane very close in time. This is currently set to 10 ticks (11.8ns). Note this is not very sophisticated reconstruction and is provided as a guide only.
  • Digitizer Waveforms (N, N=5 currently): Raw waveforms for digitizer channels for the first N triggers (one plot per trigger). For debugging purposes and to allow us to see the raw traces as we collect them. The legend on the right-hand side shows a brief description of the channels currently connected to the board, for convenience.
    • Configurable number of digitizer channels (DigitChan)
    • Configurable channel description list (DigitizerLabels)
    • Configurable number of sample triggers to make plots for (SampleTrig) (NOTE In order for more than 5 to show up on the web some html editing would be required!)
  • TDC Hit Profiles (16): The hit profiles showing the number of hits on each wire for each TDC during the last spill. One plot per TDC. Mostly for expert use.
  • TDC Time Profiles (16): The time profiles showing the hit times for all wires for each TDC during the last spill. One plot per TDC. Mostly for expert use.
    • Configurable TDC upper time (TDCMaxTime)

Event Displays

Contains crude and basic event displays.
http://nusoft.fnal.gov/nova/datacheck/beamline/evd/

  • Event display has wire chamber wires in gray, those which were hit in red and matched hits between the views as red dots. The ToF and Cherenkov detectors are shown as gray rectangles which turn red when hit.
    • Configurable number of events displays to make (SampleTrig) (NOTE In order for more than 5 to show up on the web some html editing would be required!)

Physics

Contains plots which try to give an idea of the physics/analyzable quality of the data.
http://nusoft.fnal.gov/nova/datacheck/beamline/physics/

  • Good Quality Detector Triggers: One bin for each detector. The ToF data are considered 'good' if there are hits on 3 out of the 4 light detectors on each arm. The wire chambers data are considered 'good' if there is at least one matched hit one each chamber.
  • Good Triggers: Categorizing each trigger as 'good' and 'bad'. A trigger is considered good if the US and one of the DS ToF arms has good data and the wire chambers have good data (using the conditions described above). The fraction and percentage is also shown.

Trigger

Contains plots related to the triggers which are made by the beamline systems.
http://nusoft.fnal.gov/nova/datacheck/beamline/trigger/

  • Beamline Trigger Times: Shows the time of each of the beamline triggers during this spill. The time is shown in seconds since the $00 supercycle reset. The red lines are somewhat representative of where the switchyard slow beam extraction occur on this timeline. Note that this is not fixed and may be altered over time (but the lines may be configured at runtime). Same as the Beamline Trigger Times plot on the Spill page but shown for all spills (i.e. not refreshed for each new spill).
    • Configurable switchyard start and end times represented by the dashed lines (SwydStart and SwydEnd)
  • Trigger Bit Set: The amount of times each of the 16 trigger bits saved by the trigger board is set. The legend on the right-hand side shows a brief description of the channels currently connected to the board, for convenience. Same as the Trigger Bit Set plot on the Spill page but shown for all spills (i.e. not refreshed for each new spill).
    • Configurable channel description list (TriggerBoardChannels)
  • Number of Data Fragments (4 plots): Number of data fragments collected by each of the subsystems: Trigged Board, Digitizer, MWPC Controller, TDU. One plot for each. See descriptions above. Same as the Number of Data fragments plot on the Spill page but shown for all spill (i.e. not refreshed for each new spill).
    • Configurable axis range (NumTrig)

Digitizer

Contains plots related to the digitizer data.
http://nusoft.fnal.gov/nova/datacheck/beamline/digit/

  • Mean of ADC Values: The mean ADC for each digitizer channel. Filled once per trigger. The legend on the right-hand side shows a brief description of the channels currently connected to the board, for convenience.
    • Configurable number of digitizer channels (DigitChan)
    • Configurable channel description list (DigitizerLabels)
  • RMS of ADC Values: The RMS of ADCs for each digitizer channel. Filled once per trigger. The legend on the right-hand side shows a brief description of the channels currently connected to the board, for convenience.
    • Configurable number of digitizer channels (DigitChan)
    • Configurable channel description list (DigitizerLabels)
  • Raw ADCs: The raw ADC values saved by the board. Filled once per sample per trigger. The legend on the right-hand side shows a brief description of the channels currently connected to the board, for convenience.
    • Configurable number of digitizer channels (DigitChan)
    • Configurable channel description list (DigitizerLabels)
  • Reconstructed Digitizer Hits: The number of reconstructed hits on each of the digitizer channels during the spill. The legend on the right-hand side shows a brief description of the channels currently connected to the board, for convenience. Same as the Reconstructed Digitizer Hits plot on the Spill page but shown for all spills (i.e. not refreshed for each new spill).
    • Configurable number of digitizer channels to show (DigitChan)
    • Configurable channel description list (DigitizerLabels)
  • Number of Reconstructed Hits (16 plots): Number of reconstructed hits for each channel. One plot per channel (16 plots, non-configurable).

Wire Chambers

Contains plots related to the wire chamber data.
http://nusoft.fnal.gov/nova/datacheck/beamline/mwpc/

  • TDC Hits: The number of raw hits on each of the TDCs during the spill. Same as the TDC Hits plot on the Spill page but shown for all spills (i.e. not refreshed for each new spill).
  • Matched TDC Hits: The number of matched raw hits between the wire chamber planes during the spill, for each wire chamber. A matched hit is considered to be when there are hits on each plane very close in time. This is currently set to 10 ticks (11.8ns). Note this is not very sophisticated reconstruction and is provided as a guide only. Same as the Matched TDC Hits plot on the Spill page but shown for all spills (i.e. not refreshed for each new spill).
  • Fraction of Beamline Triggers with TDC Hits: The fraction of all triggers which have hits on each of the TDCs. Note we don't necessarily expect a particle to go through two TDCs on the same wire chamber plane, so this can be a little misleading.
  • TDC Hit Profiles (16): The hit profiles showing the number of hits on each wire for each TDC during the last spill. One plot per TDC. Mostly for expert use. Same as the profiles on the Spill page but shown for all spills (i.e. not refreshed for each new spill).
  • TDC Time Profiles (16): The time profiles showing the hit times for all wires for each TDC during the last spill. One plot per TDC. Mostly for expert use. Same as the profiles on the Spill page but shown for all spills (i.e. not refreshed for each new spill).
    • Configurable TDC upper time (TDCMaxTime)

TDU

Contains plots related to the TDU data.
http://nusoft.fnal.gov/nova/datacheck/beamline/tdu/

  • TDU Time Difference: Time difference, reported by the TDU timestamps, between the first and last trigger in a given spill. We don't expect this to be greater than the length of the spill (4.2s). If it is, this is considered an error and is reported on the Total Beamline Errors plot. If this were to happen in three consecutive spills, the run is considered bad and that plot would turn red, as described above. We have never seen this, I don't believe. If we were to see it, restarting the DAQ (and therefore the TDU BoardReader) would be a good start, along with restarted SpillServer and power cycling the TDUs.

Technical Details

The framework lives in

lariat-online/daq/lariat-artdaq/lariat-artdaq/NovaOnlineDQM

For development and testing purposes, the configuration
OnlineDQM.fcl
may be used (this has parameter OnlineMode set to false by default).

As described above, there are some run-time parameters which may be set. The configuration is set up such that, when running the monitoring in the normal way (i.e. online), the parameters in the file

/home/nfs/novadaq/bin/beamline_dqm.fcl

may be edited. If running offline, one would have to write the changes into the relevant configuration.

The webpage is hosted at:

/nusoft/app/web/htdoc/nova/datacheck/beamline/

and all images are placed here by the DQM. The output file path is a fhicl-configurable parameter.

An output root file is created and placed in

/monitoring/

to keep copies of the DQM plots. However, the saving of the data products into this file is currently unimplemented.

The script mentioned above to start the DAQ just begin an art process using the configuration

art -c online_monitoring.fcl

When the 'start' script is run, the process ID is saved in the file
/monitoring/.current_pid

The 'stop' script just kills this process. This shouldn't affect the DAQ (though error messages will be sent to the messageViewer, but they aren't really errors), and a new process can be started in the Dispatcher job if required (i.e. the DQM can be started again). Two processes can't run in the same dispatched job at the same time however. The current way of managing these processes is a bit basic and if things crash or end in a bad state things could get messed up. Ensure all online_monitoring.fcl processes are killed before trying to start a new one, if there are problems.