Defining which processes are critical to a run

When DAQInterface is in the running state and an artdaq process either dies or enters the Error state, experiments may or may not want the run to end automatically given the circumstances. DAQInterface has a default set of rules for when a process death or Error should end the run, which can be overridden (how to override is described below). This default set of rules is:

  • BoardReaders: any BoardReader failure ends the run
  • EventBuilders: if we're using a RoutingMaster, then we just need at least one EventBuilder. If we're not, then any EventBuilder failure ends the run
  • DataLoggers: any DataLogger failure will end the run
  • Dispatchers: Dispatcher failure does not end the run
  • RoutingMaster: RoutingMaster failure does not end the run

To override these rules - e.g., if you have a BoardReader which doesn't send fragments so you can afford to lose it, or because online monitoring is considered sufficiently mission-critical that you don't want to lose any Dispatchers - then you can create a file made up of a set of rows of three values each. An example of this comes in the DAQInterface package; relative to the package's base directory it's in docs/process_requirements_list_example:

# This is an example of the rules an experiment can use to determine
# when a run should end. Here, "fail" should be understood to either
# mean that the process dies or that it responds with Error when
# queried by DAQInterface.

component.*    1.0 1   # Need at least one ToySimulator boardreader, and none of them can fail
EventBuilder.* 1.0 1   # Need at least one eventbuilder, and none of them can fail
DataLogger.*   1.0 0   # If the run has any dataloggers, none of them can fail
Dispatcher.*   0.0 0   # If the run has any dispatchers, any can fail
RoutingMaster.* 1.0 0  # If the run has any routingmasters, none of them can fail

If an artdaq process fails during a run, before trying to apply its default rules as to whether to end the run as a result of the failure, DAQInterface will first go through the rows of a file such as the one above. For each row, it will see if the failed process's label matches the regular expression which composes the first value in the row. If it does, then it will interpret the second value to mean "at least this fraction of the original artdaq processes at the start of the run whose labels match this regular expression still need to be OK for the run to not be ended". It will interpret the third value to mean "There have to be at least this many artdaq processes whose labels match this regular expression to still be OK for this run not to be ended". The comments in the example file above should help illuminate these rules in plain English. If an artdaq process fails and its label doesn't match any of the regular expressions in the file, then DAQInterface will revert to using its default set of rules.

If you create a file such as this, you'll need to tell DAQInterface where to find it. The way to do this is, before launching DAQInterface, to set the environment variable DAQINTERFACE_PROCESS_REQUIREMENTS_LIST to the name of a file which contains a set of rules akin to the ones just described. If you do this, it's a good idea to set this environment variable in the user-defined file which gets sourced when you set up the environment, referred to by $DAQINTERFACE_USER_SOURCEFILE, so that you don't need to remember to set DAQINTERFACE_PROCESS_REQUIREMENTS_LIST to the name of your file every time you want to launch DAQInterface.