Project

General

Profile

Online Monitoring

DQM web:
http://lbne-dqm.fnal.gov/

DAQ instructions:
https://cdcvs.fnal.gov/redmine/projects/lbne-daq/wiki/Running_DAQ_Interface

Tom's email on new server

Using the Online Monitoring

The monitoring histograms can be created either when a run has finished and has made an output file (offline) or during the running of the DAQ (online). These will be briefly described here.

Overview

The Online Monitoring uses an art module, OnlineMonitoring_module.cc, within lbne-artdaq. This takes the data from the DAQ in its online format and creates monitoring histograms; these are saved in a root file (monitoringRunXSubrunY.root) and, for most histograms, as images (pngs).

These histograms are saved in the directory

/data/lbnedaq/monitoring

on the lbne35t-gateway02 node, and synced to the web server. The plots saved as images are viewable on the web server (refreshed every 30s). All runs are archived in the above directory, with regular tarballs made of the directory for back-up.

Instructions for running follow:

Monitoring Online

The preferred way of monitoring during the running of the DAQ. The artdaq interface uses fcl files to run all the processes; adding an art analyser module to the Aggregator2.fcl file (which controls the second AggregatorMain process on the aggregator machine (see here for a description of the system)) will allow it to be run online.

This makes use of the art functionality using the data taking as a live event loop and fills the histograms for each event. At the end of a subrun, images are saved in the directories described above.

See here for instruction on running the DAQ. There are a large number of DAQ configurations; most of them have a version including monitoring and one without it. For example, the most common configurations (using all the components are):

rces_and_ssps_and_ptb
rces_and_ssps_and_ptb_nomonitoring
rces_and_ssps_and_ptb_nodiskwrite

When a configuration using the monitoring is used, the data is analysed during the running of the DAQ and output is produced in the areas discussed above. The bottom configuration will not save an output root file with the data from the run in but will run the monitoring.

Monitoring Offline

The same module can be used offline to make the histograms for older runs (using the output file created by the DAQ). This is achieved by executing the following:

Set up lbne-artdaq. On the gateway node, it is best to use the version of artdaq being used by the DAQ:

source /data/lbnedaq/lbne-artdaq-standard/setupLBNEARTDAQ

or by using your own release of lbne-artdaq.

Use the art executable with the onlineMonitoring.fcl configuration file:

art -c onlineMonitoring.fcl -s lbne_r*_sr*.root

where the source file can be specified from the data file directory. The data files are written by the aggregators and saved on either lbnedaq6 or lbnedaq7, depending on which is running the processes at the time. They are copied to tape via the lbne-gateway02 node by Tom Junk's scripts and so it is possible to catch them in the /data/lbnedaq/data area on gateway02 if needed.

This process will make exactly the same histograms and save them in the same location.

Web Interface

There is a DQM web page for the 35t run at lbne-dqm.fnal.gov.

A cron job, running on lbne-gateway02, will sync the and images (but not the root files) to the server (mounted in the /web area on the same node), which will allow the images to be viewed on the web. This syncs files between the directory /data/lbnedaq/monitoring and the web area, so if running offline for personal use, please change the output file path for the monitoring plots (this is a fhicl parameter).

There are three times during the course of a run that the images will be updated:
- an initial update 30s after the start of a subrun (fhicl parameter)
- during a run, every 500s (fhicl parameter)
- at the end of a subrun

The flow of the system is demonstrated below...

fhicl parameters

There are many fhicl parameters which may be used to control the system. The standard configuration is defined in /data/lbnedaq/config/common_code/monitoring_standard.fcl on lbne35t-gateway01 (where all the DAQ configs are stored). An exhaustive list follows:

DataDirPath: "/storage/data/" # Where the data is written by the aggregator
MonitorSavePath: "/data2/lbnedaq/monitoring/" # Directory in which to save the monitoring data (looked at by cron job)
EVDSavePath: "/data2/lbnedaq/eventDisplay/" # Directory in which to save online event display (looked at by cron job)
ChannelMapFile: "/data/lbnedaq/lbne-artdaq-standard/lbne-artdaq/lbne-artdaq/OnlineMonitoring/detailedMap.txt" # The channel map file location
ImageType: ".png" # Format to save the images in
InitialMonitoringUpdate: 30 # Time (in s) after the start of a run to save the first set of monitoring plots
MonitoringRefreshRate: 500 # Time (in s) between each refresh of the monitoring plots
EventDisplayRefreshRate: 100000 # Time (in s) between each refresh of the online event display
DetailedMonitoring: false # Switch to turn on and off 'detailed monitoring', which saves much more information (but is much slower)
ScopeMonitoring: false # Switch to set whether or not the DAQ is being run in 'scope mode'
DriftVelocity: 0.9 #mm/us # The electron drift velocity (used for scaling EVD axes)
MicroslicePreBuffer: 5 # The number of microslices buffer and saved before a trigger
MicrosliceTriggerLength: 5 # The number of microslices saved after a trigger has occurred

Other Features

Detailed Monitoring

As seen in the above list of parameters, there is a special configuration of the monitoring which makes more detailed analysis of the data. This is achieved by turning on 'DetailedMonitoring'.

Scope Monitoring

The monitoring has support for scope mode; turn on 'ScopeMonitoring' to use it. The special configurations for scope mode in the DAQ already have this version of the monitoring included as standard.

Zero Suppression?

Coming!

Event Display

A crude online event display is also made by the monitoring. This is viewable at http://lbne-dqm.fnal.gov/EventDisplay and is updated at least once a subrun.

Troubleshooting

Stale memory segments

Occasionally the monitoring module leave stale shared memory segments on the machine on which it runs (lbnedaq2) [the reasons for this are being investigated]. This will be evidenced by errors in the log file:

%MSG-e Aggregator:  Aggregator-lbnedaq2-5266 MF-online
Failed to connect to shared memory segment, errno = 22.  Please check if a stale shared memory segment needs to be cleaned up. (ipcs, ipcrm -m <segId>)

To fix this, log on to the machine and run

ipcs

and look for the memory under the header ------ Shared Memory Segments --------. Clean up any stale memory by using

ipcrm -m <segid>

Host key verification failed

At the end of the log file there may be some lines stating something like

(gnome-ssh-askpass:35232): Gtk-WARNING **: cannot open display: localhost:10.0
Host key verification failed.
lost connection
(gnome-ssh-askpass:35239): Gtk-WARNING **: cannot open display: localhost:10.0
Host key verification failed.

This occurs when the monitoring tries to copy the images and root files to the web server but can't connect. This is due to the aggregator processes running on the lbnedaq2 machine, which can only communicate through the gateway node.

As of 16 Jun 15, this issue occurs every time the DAQ is run. Ways around the problem are being looked into.