Feature #23760
DAQInterface should be able to resurrect an artdaq process
100%
Description
In light of the fact that DUNE is intended to have very low deadtime, if an artdaq process DAQInterface controls dies or enters the error state while running, DAQInterface should be able to relaunch a process of the same type and put it through transitions to bring it into the running state. Even if it's not currently the case that a resurrected process will seamlessly integrate into a running system, this DAQInterface feature will prove valuable when it comes to modifying artdaq processes so that they will support this functionality in the future.
Associated revisions
JCF: Issue #23760: if a process enters an Error state, clean it up, but let the heartbeat function decide what to do with it
JCF: Issue #23760: if a process is found to have died, then after relaunch, send it the init and start transitions
JCF: Issue #23760: no longer automatically ending things if there's a generic exception throw when querying process status
JCF: Issue #23760: sending shepherding transitions as soon as formerly-deceased process is alive, rather than after arbitrary 10-second wait
JCF: Issue #23760: minor tweaks to output during shepherding
JCF: Issue #23760: have shepherding occur not just when process dies, but also when it enters the error state
JCF: Issue #23760: the process-shepherding feature in this issue only happens if you set "shepherd_bad_processes: true" in the $DAQINTERFACE_SETTINGS file
History
#1 Updated by John Freeman about 1 year ago
- % Done changed from 0 to 100
- Status changed from New to Resolved
Resolved with commit e2fe23ca71def385a2974305853f967209fd3f06 at the head of feature/23760_resurrect_processes. To test this, you want to add the line
shepherd_bad_processes: true
to the $DAQINTERFACE_SETTINGS file and use $DAQINTERFACE_PROCESS_MANAGEMENT_METHOD == "direct" before launching DAQInterface. Then if a process dies or goes into an Error state when DAQInterface is in the running state, it will relaunch the process and attempt to send it an "init" transition followed by a "start" transition. Note, of course, that there's zero guarantee that datataking will proceed smoothly if this happens, since there's currently no requirement that artdaq processes support this behavior. This feature will prove useful for testing artdaq process behavior when they're modified to support continuous running, however.
JCF: Issue #23760: if a process dies DAQInterface will relaunch it, though it won't send it transitions