Feature #3956: We should protect against all module failures at end run so that files get closed correctly
Ensure that disk files always get closed no matter how the DAQ is shut down [art-related]
This has been a long-standing request that I am finally capturing in this Issue. Some related discussion was captured in Issue #3956.
As everyone can understand, this request is rather broad. However, there are some very concrete steps that can be taken to improve the reliability of closing disk files.
Here are some notes from a discussion on 02-July between Marc, Chris, and Kurt:
In further discussions, the following concrete tasks were identified:
1) exceptions are already handled by art, but in the case of artdaq/ds50daq, art is run in a thread, and it may not be clearly defined how signals are sent to the different threads
1.1) a recommendation was made to set the thread mask so that only the main threads gets signals, and it puts the right thing(s) on the queue to tell art how to react
1.2) Jim's MPI/PMT shim may be needed to get the most reliability that we can
1.3) (internal) questions include: how could a fatal error in one part of the MPI program get turned into a graceful shutdown in another part?
1.4) Possible action items:
1.4.1) investigate/improve how signals and interrupt handling is done
1.4.2) improve the way that PMT responds to errors and signals, including Jim's shim
- Document signal handling within art, and ensure via tests that response to signals within art executables is consistent and as desired
- Document the pattern that should be used by executables that run art in a thread as part of a broader application and investigate whether existing artdaq/ds50daq executables are currently following this pattern. The goal is to have "signal handling within artdaq/ds50daq executables consistent and sufficient to lead to an orderly shutdown of the executables (including any art threads) as quickly as possible". [quote from an email from Chris]