Rework the handling of enqueue timeouts
There are a number of places in the artdaq and ds50daq code (e.g. EventStore.cc and Aggregator.cc) that push events onto the GlobalQueue (that provides events to the art thread). At the moment, the size of the queue is generally set to 20 and the timeout for pushing a new event onto this queue is set to 5 seconds. However, we don't have a good model for what to do when the timeout expires. At the moment, the event is dropped on the floor and a MessageFacility message is generated. It would be much better to not drop data on the floor, but we also want to avoid getting into a retry loop which we can't break out of (for example, if the run is ended).
Resolving this issue should include doing a survey of the artdaq and ds50daq code bases to look for all places where we enqueue events and make sure that failures/timeouts are being handled well.
#1 Updated by Kurt Biery over 7 years ago
Just a comment on this issue... In a long test run (~45 hours) of V1495 firmware 2b, there were instances in which events failed to be enqueued. The run continued fine after these errors, so the backpressure seems to have been temporary. But, this issue comes up even in cases where there has been no catastrophic failure.