Project

General

Profile

artdaq Work for Summer 2016

Initial version: circa 01-Jun-2016, KAB

There are a number of artdaq investigations and possible changes that were identified during 35-ton DAQ operation and in various discussions about needs and preparations for protoDUNE DAQ. This page is intended to be a working list of topics, tasks, and ideas related to those. In addition, it seems convenient to list related issues, requirements, etc. on this page.

Issues noticed during 35-ton DAQ running

  1. Memory growth on lbnedaq6 (Resolved)
    • lbnedaq6 was the computer that had the EventBuilders and Aggregators running on it.
    • The most noticeable symptom of the memory growth would be when free memory would become exhausted, the rate at which we could write data to disk would go down, and we would see complaints from the FragmentGenerators (in the BoardReaders) about problems finding free buffers (because of back-pressure exerted by the Aggregator through the system.)
    • John and Eric have found and fixed memory leaks in the NetMonInput code and the Ganglia metrics reporting code, so we believe that this issue is resolved. John has successfully run the 35-ton DAQ with the fixes included and did not see memory growth for runs of length 17-20 hours.
  2. Reliable closing of data files (Resolved)
    • Kurt and John have implemented changes that should ensure the reliable closing of files in the 35-ton system (artdaq, lbne-artdaq, daqinterface).
    • The changes were to add an optional, and configurable, delay to the cleanup of the MPI program when one of the processes crashes. This allows an external entity, like daqinterface, to notice that something unexpected has happened and send the usual end-run ("stop") commands to the artdaq processes so that EventBuilders will send the appropriate messages to the disk-writing Aggregator, and that Aggregator will have the opportunity to close the data file successfully.
    • For experiments that do not use daqinterface, we will need to describe the use of the environmental variable that controls this delayed cleanup in artdaq (ARTDAQ_PROCESS_FAILURE_EXIT_DELAY).
  3. Communication between "receiver" threads and the rest of the BoardReader
    • John made quite a few modifications to "receiver" and BoardReader code to improve this situation. We believe that the situation was quite good at the end of the 35t data taking. Making these changes available to future FragmentGenerator developers is part of the common code base mentioned in the Desired Enhancements section of this page.
  4. Periodic clearing of the disk cache on lbnedaq6
    • It has been observed on both 35t and MicroBooNE, that when the disk cache becomes full, disk-writing performance suffers. This has been investigated by several DAQ experts, but so far, no cause for this behavior has been uncovered. When this happens, there are not large amounts of dirty buffers, so the problem is not flushing the data from memory to disk.
    • The work-around is to run a cron job to periodically clear the disk cache.
    • This may be a feature of current versions of Linux, but it would be interesting to investigate this further.
    • We should look into direct I/O (thereby avoiding the buffer cache).
  5. Any remaining instances of unexplained back-pressure in the 35-ton system?
    • It would probably be worthwhile to run stress tests of the system.
    • Do we need to enable real-time priority for BoardReader processes?

Planned enhancements

  1. Decouple online monitoring from the MPI program.
    • An implementation that uses shared memory is nearly complete. Another implementation that uses RTI DDS is largely done but needs some licensing issues to be worked out.
  2. Incorporation of the automatic file-closing features in art
    • In an upcoming release of art (v2.1, we believe), the changes to the RootOutput_module, and art itself, to support configurable automatic closing of files will become available. With this change, the artdaq system will not need to pause and resume runs in order to get files to close after they reach a certain size (or other thresholds).

Desired enhancements

  1. Store the full artdaq system configuration in the raw data files
    • We have shown that this is straightforward in principle. In a simple test, we provided additional FHiCL to the disk-writing Aggregator in its configuration document, and we were able to retrieve that information from the raw data file using config_dumper.
    • The main change that we will need to make this work is to combine the individual artdaq process configuration documents into one big document.
    • Recall that this "configuration" will contain the parameters that were used to configure each artdaq process, the parameters that were used by the various FragmentGenerators (BoardReaders) to configure their upstream hardware, and the parameters that were used to configure any/all art modules that are run in the system.
  2. Provide a common code base for FragmentGenerators
    • FragmentGenerators are the components that we develop to interact with upstream hardware. They are instantiated and used by BoardReaders.
    • One of the utilities that may be useful include support for a third thread for hardware readout. Others have been discussed.
      • Circular buffers, and a thread for filling them from hardware
      • A thread for monitoring hardware and/or running checks on data in the circular buffer
    • Perhaps common base class, with instances for a fake data and hardware data modes differing only in data source
  3. Provide support for another error level ("problem")?
  4. Automated entries in the elog?
  5. Are there improvements to the synchronization of fragment sending in the BoardReaders that we want to make?
  6. Timeouts for incomplete events.
  7. Storing of MessageFacility messages in a searchable database.
  8. Identifying and/or developing tools to help debug system issues and monitor system performance. For example, if/when system throughput is less than expected, it would be great to have tools that provide visibility into the source of the problem.
    • Ron has checked into one of the artdaq repositories his rgang-iperf test application.

Investigations and features that would help with protoDUNE DAQ planning and testing

  1. Playback mode. (Resolved)
    • The idea is to create a FragmentGenerator that can be given the path to a file on disk and told the fragment ID of the fragments of interest, and that FragmentGenerator would read the events from the file, pull out the requested fragment from each event, and send the fragments downstream (simulating the readout of the hardware).
    • Secondary considerations include replaying the events over and over to allow long test runs and pre-loading the fragments into memory so that higher event rates can be achieved (compared to reading each event when needed)
  2. Demonstrate the use of multiple disk-writing Aggregators. (Resolved)
    • Code changes were made in the artdaq and artdaq-demo repositories to get this to work and create scripts to demonstrate it (Issue #13305).
  3. Demonstrate the writing of data from multiple EventBuilders (no Aggregators configured). (Resolved)
    • We need to check whether there are provenance or run-level data product issues that would make running without Aggregator(s) troublesome [this is probably not a problem since we wrote data from EventBuilders when artdaq was first created]. And, we need to determine what entities in the system can serve as sources for online monitoring events.
    • At the moment, this may not be test-able on the 35t DAQ cluster; daqinterface may need to be changed to allow configurations in which there are no Aggregators.
  4. Define a set of supported configurations that provide flexibility for overcoming limitations in disk-writing speed. Provide scripts and/or configurations that quickly and easily demonstrate the different options.
    • One such configuration of interest is to have EventBuilders running on several nodes, have one Aggregator on each of the nodes that have EventBuilders, and configure the EventBuilders to only send their complete events to the Aggregator on the same node.
      • One example of this would be to have BoardReaders running on lbnedaq1, 2, and 3; have N EventBuilders running on lbnedaq6, another N EventBuilders running on lbnedaq7; have one disk-writing Aggregator running on daq6 and another one running on daq7. The EBs on daq6 would send their events only to the AG on daq6; same on daq7.
      • Another example would be to temporarily ignore daq6 and daq7, put EBs on each of daq1, 2, and 3 (along with the BoardReaders), and put a single disk-writing AG on each of daq1, 2, and 3. This wouldn't tell us too much about supported data rates, but it would demonstrate a configuration which may be of interest to the collaboration. That is, a configuration in which there are a large number of medium performance nodes in the DAQ cluster, and we want to A) spread the disk writing work among all of them and B) avoid sending complete events over the network (between the EBs and the AGs).
  5. Port BoardReader code to the RCE?
    • There was a phone meeting on this topic a couple of months ago, but we don't know the current status.
  6. Investigate the writing of data files to XROOTD ??
  7. Run tests on available clusters at CERN.
  8. Investigate how doubling the number of disks might help us avoid writing to and reading from the data disks at the same time. [It's not clear how this would work without restarting runs and reconfiguring the system.]

System issues

  1. The full DAQ group needs to discuss ways to throttle triggers or time slices at an integrated system level (and choose one). artdaq processes know when they can and can not successfully send data downstream, but we currently don't have any way for upstream hardware, or "receiver" threads inside BoardReaders, to communicate with either other or a central entity to say that buffers are full and events/triggers either need to be throttled or dropped on the floor.

Requirements

In this section, we list consensus or recommended requirements for the behavior of the system. These may change, but hopefully they'll provide useful baseline information in the meantime.

  1. Here is a note from Tom Junk regarding the need for storing sequential events in a single raw data file: "We do not need events to be sequential, or in a single file stream, as long as the data for a triggered readout is contained within a single art event and not multiple ones. The 35-ton feature of spreading data
    from a single readout across multiple art events required some offline event assembly that works much better if the events are in a singlestream (ordering within a stream still isn't too critical however as art supplies an index)."
  2. We believe that a pseudo-random subset of the raw data events is acceptable for online monitoring.
    • Some further information from Tom Junk: "Monitoring is needed, though the pseudorandom fraction can be small. Lightweight monitoring like making means and RMS's should be done on as high a fraction of events as possible. Event displays made one every spill should be enough (takes a few seconds to look at one, and the spills are only 4sec long). Some fraction of the data should be reconstructed with LArSoft, but that fraction can be as small as we like."
  3. Some amount of compression, likely ROOT compression level 1, may be needed from the DAQ software system. Compression within an art module inside an EventBuilder process is not foreseen to be needed, at this time.
  4. A checksum is requested. (This needs to be understood more, e.g. where in the chain it should be computed.)
  5. Filtering inside of artdaq does not appear to be needed at this time.
General DAQ requirements (listed by Kurt, some of these may not apply to protoDUNE):
  1. Provide a pathway, and infrastructure, for configuration of the upstream hardware.
  2. Read out the physics and calibration data from the hardware.
  3. Assemble complete events from the data fragments that are read out from the hardware. Each fragment contains data from a geographically distinct part of the detector. Each event contains data from a well-defined time window or trigger condition.
  4. Handle back-pressure gracefully (internally), and appropriately interact with the experiment-specific mechanisms to throttle triggers or otherwise communicate back-pressure conditions to the full DAQ system.
  5. Provide individual process and overall system state models that allow the hardware and electronics to be configured, data taking runs to be started and stopped, and the system to be shut down gracefully. Also, provide infrastructure for transmitting requests to change state to all necessary processes in the system (and replies).
  6. Provide the ability to filter events and/or compress the data in software.
  7. Provide the infrastructure for online data quality monitoring of the data.
  8. Provide tools to enabling of the data acquisition system itself (DAQ Monitoring).
  9. Provide a system for reporting status and error messages.
  10. Support event rates up to X and data rates up to Y delivered to the software filter algorithms.
  11. Support a data rate to disk of Z.

Design choices and ramifications

In this section, we list consensus or recommended choices for the behavior of the system. These may change, but hopefully they'll provide useful baseline information in the meantime.

  1. If we need to write data files from multiple Aggregators or EventBuilders, the events from concurrent files would naturally all have the same run and subrun numbers. The names of the concurrent files would need to have some sort of identifier to differentiate them from one another (e.g. the EventBuilder number or the Aggregator number). This identifier will be in addition to all of the typical information that is part of a raw data filename, such as run number, etc.
  2. Once the automatic closing of files moves to the RootOutput_module, the closing of files that are written in parallel would naturally be decoupled, so there would be no guarantee that files would have the same numbers of events, etc. Of course, if event data size is relatively constant, then one would expect the numbers of events per file to be similar.

DAQ cluster design and support

The specification of the DAQ cluster computers, switches, operating system(s), etc. is not under the auspices of the artdaq team, but it has a connection to artdaq in that the capabilities of these components will strongly affect the overall performance of the system. In addition, the support of these components will be quite important in ensuring the smooth operation of the DAQ. This section is intended to capture questions, ideas, and proposals in this area.

Questions:
  • What will be the support model for the computers, networking, and system-level software for the DAQ computers?

Here is the preliminary list from Giles for DAQ activities in this area (we haven't yet heard what discussions may have happened regarding these items):

F1. Conceptual design of server/networking setup, oriented towards test stands and ProtoDUNE
F2. Implementation of server/networking at Fermilab test stands + advice for other test stands
F3. Implementation of server/internal networking at CERN for ProtoDUNE and tests
F4. External netowrking connections at CERN
F5. Maintain servers/networking during ProtoDUNE commission and run
F6. Design and discussions with conventional facilities at SURF for FD computing configuration

Teststands

What teststands are, or will be, available for testing DAQ system software, hardware, and performance? What are the capabilities of each of them?