Feature #6315
Detect stopped FEBs and recover with Sync
Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Start date:
05/21/2014
Due date:
% Done:
0%
Estimated time:
Description
Background:
FEBs with too high a triggered hit rate can overflow their output buffer, causing production of hit data to stop. Data flow can be recovered by issuing a sync from the timing system (which I think issues a start DAQ? Is that the critical bit?).
We do not want to blindly issue periodic sync, since there is a non-negligible probability of a sync tripping up a DCM.
We therefore need to issue a sync specifically when we detect that an FEB has shut off.
The path forward that seems to involve the least new coding is:
- DCMApp checks bit 4 in microslice header for "FEB off"
- In case bit is set, DCM issues warning message to the effect of "FEB Buffer Overflow Shutoff Detected"
- MessageAnalyzer has a condition to detect this message, and a rule to request Run Control to issue a sync
- we believe this is implemented in 2E/2D FEB V4 firmware, but not 100% sure
- this requires 11Dec13 DCM firmware. In use at NDOS, but not yet FarDet
- Run Control needs to hold off for several seconds after issuing a sync before responding to Message Analyzer, or else we risk an infinite sync loop. This is in HEAD, but we're using a branch on FarDet.
- maybe not, though, since the infinite loop comes from rules that trigger a sync based on corruption messages that often follow a sync. Leaving those rules off may avoid this problem.
History
#1 Updated by Peter Shanahan almost 7 years ago
Another detail on DCM app side:
We don't want to be perpetually requesting syncs if there's an FEB that's too hot to stay live, so we would need a configurable parameter:
- Minimum time between subsequent Warnings (and therefore sync) - default 5 minutes?