Project

General

Profile

Feature #22122

How to handle long delays in table update acknowledgements

Added by Kurt Biery over 1 year ago. Updated over 1 year ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
-
Target version:
Start date:
03/13/2019
Due date:
% Done:

0%

Estimated time:
Experiment:
-
Co-Assignees:
Duration:

Description

At protoDUNE, I noticed a situation in which a BoardReader seemingly never replied to the table update messages from the RoutingMaster. Since the current RM logic retries the same exact table update until all receivers have acknowledged it, this resulted in dataflow stopping, even though incomplete events could have been timed out in EventBuilders (presuming that the bad BR would also not have sent data fragments).

Eric and I talked a little about this, and there are some subtleties to how we would best include additional tokens in updates, while still keeping un-acknowledged information in the updates. So, we are thinking about this more.

In the meantime, I want to capture a tentative code change that I made in RoutingMasterCore. Basically, all I did was fix a bug in incrementing the 'counter' variable and provide some debug TRACE messages when there are a few stragglers that haven't acknowledged the table update.

History

#1 Updated by Kurt Biery over 1 year ago

The code change is on branch feature/22122_RMCore_Tweaks in the artdaq repo.

#2 Updated by Eric Flumerfelt over 1 year ago

  • Status changed from Assigned to Resolved

Before branching off Issue #22280, I did validate the changes included here (I merely felt that more changes were necessary).

Moving this issue through the state machine.

#3 Updated by Eric Flumerfelt over 1 year ago

  • Status changed from Resolved to Reviewed
  • Tracker changed from Idea to Feature
  • Co-Assignees Eric Flumerfelt added

#4 Updated by Eric Flumerfelt over 1 year ago

  • Target version set to artdaq v3_05_00
  • Status changed from Reviewed to Closed


Also available in: Atom PDF