Investigate whether an Init command sent while the system is running causes the DAQ processes to stop the run in the correct order
The system (and individual process) state model supports the idea of re-initializing the system during a run. The way that this is implemented in the individual process state machines is to first stop the run, and then run the requested re-initialization.
Inside the ds50MasterControl.rb script that we are currently using to control the DAQ system from the command-line, many of the DAQ commands are sent to individual processes in a well-defined order. For example, when a run is stopped, the command is sent to the V1495 trigger card first, other hardware modules next, the EventBuilders next, and the Aggregator last. In this way, the system is given the chance to drain all of the events in the system in a graceful way.
However, it is not clear if the Init command is sent in a well-defined order. So, there could be occasions where the normal stop-run order is not respected (when an Init is done during a run).
We should investigate this and fix ds50MasterControl.rb, if needed.
#2 Updated by Kurt Biery about 7 years ago
- Due date changed from 06/07/2013 to 07/19/2013
- Status changed from New to Assigned
- Assignee set to Kurt Biery
The order in which Init commands are sent to the various DAQ processes was recently changed for unrelated reasons, and now is very close to what we would want here.
However, I'm concerned that sending an Init command while the DAQ processes are in a Running state could be a bad thing. For example, if the 1495 Init operations does a VME reset, that could cause the 1720s to be reset before their associated BoardReader processes have been told to stop the run. I've asked Alessandro if maybe we want to remove the ability to handle the Init command during a run.
Here are some notes from an email that I sent:
Some reminders about the current state of the software:
1) We currently send the Stop (end-run) command to the DAQ processes in a very well-defined order: first to the 1495, then to the 1720(s) and 1190, then to the EventBuilder(s), then to the Aggregator. In each case, we require that all processes at each step complete the transition and reply with a success code to ds50MasterControl before we move on to the next set of processes. This strict ordering is important so that we turn off the flow of triggers and gracefully drain the data out of the system.
2) In the original design of the state machine (that is used by each of the DAQ processes [BoardReader, EventBuilder, Aggregator]), we allowed Init and Shutdown transitions when the process is in a Running state. Based on the state model that we currently have defined, when this happens, the DAQ process first goes through a Stop transition and then executes the requested transition (Init or Shutdown).
3) There is now an option to reset the VME crate when an Init command is sent to the 1495 board.
4) We are looking into enhancing the Shutdown command to exit the DAQ processes (in addition to moving them to their ground state).
A) Let's say that the system is in the middle of a run, and the operator requests an Init transition. This will be sent to the 1495 first. The 1495 will respond by executing its Stop action and its Init action. If the Init action includes a VME reset, the VME crate will be reset, and the 1720s will be reset without ever having been told to end the run (Stop).
B) Let's say that the system is in the middle of a run, and the operator requests a Shutdown transition. This will be sent to the 1495 first (following the usual order that we follow to graceful end a run). The 1495 will respond by executing its Stop action and its Shutdown action. If the Shutdown action includes an exit of the BoardReader process that is responsible for the 1495 readout, then that BoardReader will exit, and it is possible that the full MPI program will shut down. At this point, we lose the ability to control the end-run (Stop) sequence among the remaining processes.