Project

General

Profile

Feature #22349

Reduce the number of routing_table_update messages in situations in which not all ACKs are received for a given update

Added by Kurt Biery over 1 year ago. Updated over 1 year ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
-
Target version:
Start date:
04/10/2019
Due date:
% Done:

0%

Estimated time:
Experiment:
-
Co-Assignees:
Duration:

Description

At protoDUNE, we see that only ~60% of the routing_table_updates get acknowledged in a given attempt with a system that has 84 BoardReaders (only 49 ACKs are typically received by the RoutingMaster).

My suspicion is that the 'missing' ACKs (UDP) are lost somewhere in transit, but I haven't yet identified the cause. Some simple iperf tests on the protoDUNE cluster didn't uncover any problems, and an initial look at buffer sizes also didn't seem to uncover any problems.

In this situation, it's possible for table updates to be retried many, many times. In those cases, the acknowledgements from a small number of BoardReaders never happen to make it into the list of ACKS that get through to the RM, so it retries the table_update.

To help debug this issue and work around these lost acknowledgements, I've added a bitset of 'already_acknowleged_ranks' to the RoutingPacketHeader and modified RoutingMasterCore to set the appropriate bits in this bitset and DataSenderManager to use this bitset to determine whether to send an acknowledgement.

I've tested these changes on mu2edaq01, but I've been unable to reproduce the problem of missing ACKs in a test system on mu2edaq01 with 60 BoardReaders, so I'm going to try some tests at protoDUNE with ToyComponents next.

History

#1 Updated by Kurt Biery over 1 year ago

  • Subject changed from Reduce the number of routing_table_update messages in situations in which not all ACKs are received to Reduce the number of routing_table_update messages in situations in which not all ACKs are received for a given update

#2 Updated by Kurt Biery over 1 year ago

I committed the code changes to branch feature/22349_RoutingTableUpdate_SkipUnneededAcks, which is based off for_dune-artdaq. I chose that parent branch to make testing at protoDUNE easier, and I'll look into a branch based on develop when that is needed.

#3 Updated by Eric Flumerfelt over 1 year ago

  • Target version set to artdaq v3_06_00
  • Status changed from Assigned to Closed

Also available in: Atom PDF