Feature #22125

Avoiding long reconnection attempts helps improve resilience against loss of EventBuilders

Added by Kurt Biery over 1 year ago. Updated over 1 year ago.

Target version:
Start date:
Due date:
% Done:


Estimated time:


In tests of killing EventBuilders at protoDUNE, I noticed that the BoardReaders would spend 20 seconds per event trying to send data to the failed EB (10 seconds trying to reconnect and 10 seconds trying to send the data, I think). In a system with 20 buffers per EB (and therefore, 20 entries in the routing table per EB), this would essentially hobble the system for ~400 seconds, and it was often hard for the system to recover from that. (Giovanna and others at CERN had noticed that reducing the number of EB buffers from 20 to 1 allowed the system to continue gracefully. I presume that they still endured the 20 second pause while the BRs tried to reconnect to the failed EB, but I haven't verified that and I would guess that they didn't notice it.)

To help avoid such long attempts to reconnect and send data to a failed EB, I made some candidate code changes in The spirit of the changes was to keep the existing retries when initially connecting, but only try to reconnect once (per 'call') once the initial connection has been lost.


#1 Updated by Eric Flumerfelt over 1 year ago

Should the connection_was_lost_ variable be initialized in the member initialization list rather than the body of the constructor?

#2 Updated by Eric Flumerfelt over 1 year ago

  • Status changed from Assigned to Resolved

Moving issue through state machine

#3 Updated by Eric Flumerfelt over 1 year ago

  • Status changed from Resolved to Reviewed
  • Tracker changed from Idea to Feature
  • Co-Assignees Eric Flumerfelt added

I have reviewed the code and done before/after testing, using runTransferTest and the routing_master_example simple_test_config.

#4 Updated by Eric Flumerfelt over 1 year ago

  • Target version set to artdaq v3_05_00
  • Status changed from Reviewed to Closed

Also available in: Atom PDF