Project

General

Profile

DUNE scenarios and consequent software requirements

n.b. The notes here assume that you're familiar with the DUNE DAQ description found starting on p. 404 of this August 2019 draft of the TDR . (The TDR is also available in the DUNE DocDB as document number 16880. A certificate-based link to this document is here . The data acquisition chapter starts on page 405 in volume III.)


Someone implements an unacceptably slow trigger candidate finding algorithm, resulting in a trigger candidate generator application which is unable to process trigger primitives at the rate they're coming in

REQUIREMENTS:

  • Open for discussion, but one or both of the following actions could be performed by the trigger candidate generator: (1) prescaling by ignoring trigger primitives from certain timestamp ranges and notifying the downstream MLT (perhaps via CCM) that some candidates which would ordinarily be found won't be, or (2) exiting out with an error informing the system that a faster algorithm is needed

Someone needs to take APAs currently in use in order to perform calibrations

Say that all 150 APAs in the module are currently being used in a run dedicated to standard datataking, and then someone wants to use APAs 1-10 to perform a calibration run. One obvious but inefficient way to do this would be to end the current run, split the module into two partitions, and then configure and start two new runs, one performing calibrations with APAs 1-10, and another performing standard datataking with APAs 11-150. I use the word "inefficient" because APAs 11-150 shouldn't have to stop datataking just because a small part of the module is meant to be dedicated to calibration. A more efficient way would be to remove APAs 1-10 from the original run so they can then be used for the calibration run, while during and after the removal process data from APAs 11-150 continues to flow in the original run.

REQUIREMENTS:

  • The processes used in the original run need to be notified ASAP when they should no longer expect data (trigger primitives, trigger candidates, raw data, etc.) from part of the detector they were previously receiving data from

Design notes:


A calibration involves stepping multiple times through a register in upstream hardware

We take calibration data for a very brief period of time, change a register, take more data, change it again, and so on.

REQUIREMENTS:

  • It should be possible to reset a register or two without needing to fully reconfigure everything from scratch, otherwise calibrations could be very time-consuming
  • When looking at an event from a calibration, it should be easy to figure out what the register setting was for that event

Design notes:


A process encounters an unrecoverable error during running

REQUIREMENTS:

  • Other processes should be made aware of the effective loss of the process
  • Possibly other processes should be made aware of the recovery of that process as well
  • To the greatest extent possible, error and warning messages should reflect the cause of - as opposed to symptoms of - the error

Design notes:


Trigger primitive finding is originally implemented in a standalone process, but then someone figures out how to do it in upstream firmware

REQUIREMENTS:

  • People who wrote the algorithms for the trigger candidate generator processes - and who may have moved on to another experiment in the meantime - shouldn't have to worry about where the trigger primitives came from, i.e., shouldn't have to change the code they wrote which reads in trigger primitives

We start off using FHiCL to configure the processes, but then the decision is made to switch to JSON/XML/Python/etc.

REQUIREMENTS:

  • An interface should be available to read variable values in from configurations which is independent of any particular data-interchange format

Design notes:


While data's making its way downstream in the DAQ, a new configuration comes in

An example: trigger primitives are created, then while they're sitting in a trigger candidate generator's buffers waiting to be analyzed, a new configuration comes in where the new trigger candidate generator algorithms assume they're getting trigger primitives found using the same new configuration - not the previous configuration.

REQUIREMENTS:

  • The configuration used to process data as it flows downstream should be self-consistent from the time trigger primitives are formed to the time that an event is formed

Design notes:


A physicist wants to work on developing a new trigger candidate algorithm, but it'll take a long time and no-one wants the development to interfere with datataking

REQUIREMENTS:

  • Some sort of process should exist which generates realistic trigger primitives under different conditions, so developers won't actually need to use experiment hardware or even teststand hardware

Design notes:


As understanding of the software's interaction with the hardware matures, people want to pin certain threads to certain cores, or adjust thread priority

REQUIREMENTS:

  • When someone wants to do this, it should be easy - whether we're talking a CPU or a GPU

A third party library used in a process causes a crash, but not before it prints a helpful error message to stdout or stderr

REQUIREMENTS:

  • Whatever sort of formal messaging framework we settle on, messages to stdout and stderr can't consequently be "buried" - it has to be easy for developers (and perhaps even end users) to see ALL messages, not just those that use the framework

A process dies messily, leaving behind clogged ports, floating shared memory blocks, etc

REQUIREMENTS:

  • There needs to be some mechanism to clean up after the messy death

A common scenario is, a process dies and leaves behind a mess. A user then tries to debug what's going on by restarting the same process with the same config (and hence ports, shmem blocks, etc.) and now it behaves even worse than before because it's reading in stale data from the stale shmem blocks, it's unable to open the ports, etc. Issue #22372 for DAQInterface is an example of the type of cleanup mechanism that addresses this. It may need to be done at the CCM level on DUNE.


A non-trivial (i.e. long-running) configuration of electronics needs to be done

We can imagine that some types of (re)configuration, like a system reconfiguration (e.g. taking an EventBuilder out of the system) could happen very quickly. Other configuration steps, like downloading firmware or setting registers in electronics could take a non-trivial amount of time. To handle these longer configuration operations, we can imagine A) taking the relevant dataflow application out of the system, B) writing the configuration data into the electronics, and C) adding the relevant dataflow app back into the system.

In step (A), we expect that taking the relevant dataflow app out of the system will be a quick operation, so it can be inter-leaved with normal data taking. In current jargon, this would correspond to taking the relevant BoardReader out of the system. At the moment, configuration information is passed through the BR to the electronics, but even if the configuration step uses a different software application, it will be important to remove the readout application from the overall system since the data will likely be garbage during the reconfiguration step.