Project

General

Profile

Feature #22532

Idea #22529: Support the (proto)DUNE DFO Model in artdaq

Add acknowledgements to Request protocol

Added by Eric Flumerfelt 5 months ago. Updated 4 months ago.

Status:
Resolved
Priority:
Normal
Category:
-
Target version:
-
Start date:
05/07/2019
Due date:
% Done:

0%

Estimated time:
Experiment:
-
Co-Assignees:
Duration:

Description

There should be an option in RequestReceiver and support in RequestSender for acknowledgements of requests (instead of the RequestSender owner informing RequestSender when a given sequence ID is complete). This would help make the Request protocol have the same reliability as the Routing Table protocol.


Related issues

Related to artdaq - Feature #22569: Update RequestReciever's interfaceResolved05/09/2019

Precedes artdaq - Support #22701: Add additional unit tests for RequestMessage.hhNew05/08/2019

History

#1 Updated by Eric Flumerfelt 5 months ago

I started looking at this, and some associated changes to the Request messages, and realized that to fully implement this, RequestSender is going to have to know what ranks to expect acknowledgements from. There are several strategies for doing this, but I would like to make sure that the solution we implement will work for DUNE, where the sending ranks may come and go, and/or each request may have a different subset of senders.

#2 Updated by Eric Flumerfelt 5 months ago

  • Assignee set to Eric Flumerfelt
  • Status changed from New to Assigned

I've put my initial code changes on artdaq:feature/22532_RequestAcknowledgements

#3 Updated by Eric Flumerfelt 5 months ago

  • Status changed from Assigned to Resolved

There is a feature in artdaq_core needed for this: artdaq-core:feature/22532_TimeUtils_ElapsedTimeFromTimespec.

I have tested this using unit tests and a set of simple requestSender/requestReceiver tests.

#4 Updated by Eric Flumerfelt 5 months ago

#5 Updated by Kurt Biery 4 months ago

Hi Eric,
I have some questions about the RequestMessage Constructor (https://cdcvs.fnal.gov/redmine/projects/artdaq/repository/entry/artdaq/DAQrate/detail/RequestMessage.hh?rev=feature%2F22532_RequestAcknowledgements#L466)...

The first thing that I noticed was that when I tried to compile the code on this branch, is that the compiler complained about unused variables "size" and "end" in the RequestMessage Constructor. I see that these variable are used in "assert" statements, and I'm guessing that the assert statements are compiled out in profile builds, so those variable end up not being used.

A few questions:
  1. If we add a TLOG statement when the size that is passed in is too small, is that enough? Should we return early? Throw an exception?
  2. is the loop on lines 480-485 missing an increment of the "ptr" variable to point to the next packet?
  3. if we add a simple "if" statement check on the values for ptr and end (in addition to the assert on line 482), I presume that a TLOG message would be appropriate. What else? Skip the creation of any remaining RequestPacket instances?

Thanks,
Kurt

#6 Updated by Eric Flumerfelt 4 months ago

1. A TLOG would probably be okay, as well as returning a marked-as-invalid packet.
2. No, ptr is incremented in the RequestPacket constructor. This is bad code and should probably be changed (or at least thoroughly documented).
3. Yes, this would be at the very least a warning about mismatched received message size and contents. Once the end has been reached, any further accesses would be a segfault, so it should abort in that case.

#7 Updated by Eric Flumerfelt 4 months ago

I just remembered that the reason for the bad code in 2 is because the ptr is not incremented by sizeof(RequestPacket) due to an extra word being inserted into the serialized object to represent the number of bytes in the VectorBitset...

#8 Updated by Kurt Biery 4 months ago

Eric,
I've committed a new version of RequestMessage.hh that has candidate changes for the issues that we've discussed. In addition, I created a RequestMessage_t.cc unit test file to (hopefully) test the changes that I've made.

I was hoping to keep the assert statements in the RequestMessage constructor, but I didn't know how to avoid having them confuse the unit tests, so in the end, I took them out. And as we discussed, there are now TRACE error messages and appropriate internal settings that (again hopefully) accurately represent the internal state of the RequestMessage. The unit tests help to verify that.

For the incrementing of the "ptr" variable, I added an attribute to RequestPacket that will provide the size of the de-serialized message. And, I used that in decoding the full RequestMessage. This is to take into account your point that a simple sizeof(blah) call would not be sufficient for incrementing "ptr" in the RequestMessage constructor.

Please take a look, when you get a chance.
With these changes, all unit tests pass on the feature/22532_RequestAcknowledgements branch.
Thanks,
Kurt

#9 Updated by Eric Flumerfelt 4 months ago

I have reviewed the added code and run the new test. I have some ideas for further unit tests, which I will add to a separate Issue.

#10 Updated by Eric Flumerfelt 4 months ago

  • Precedes Support #22701: Add additional unit tests for RequestMessage.hh added

#11 Updated by Kurt Biery 4 months ago

Hi Eric,
Are there configuration changes that should be made to keep up with these code change(s)? I suspect not, but I wanted to ask. In an incremental test of switching a routing-master-based system to use this new code (actually to use the merged working/22535_protoDFO_testing branch), the push-mode BR complains that it doesn't have routing information. Looking at a pull-mode BR, I don't see table updates being received in the TRACE log. I'm sure that I'm missing something, so any hints that you have would be appreciated.
To be clear, the only change that I've made to a config that works with artdaq v3_05_00 is to move the routing_token_port parameter in the RM config into a token_receiver block. As I say, there may be other changes that I need to make, but I'm forgetting them.
Thanks,
Kurt

#12 Updated by Eric Flumerfelt 4 months ago

IIRC, the only new configuration parameters have default values that make the system perform as it did before the change. Do RequestSender_t and requestsender -c test/DAQrate/requestsender.fcl work as expected?

#13 Updated by Kurt Biery 4 months ago

RequestSender_t was working as expected.

I found the problem... I needed to merge bugfix/22267_RoundRobin_MinParticipantsFix into the working/22535_protoDFO_pduneHacks branch.
So, it was completely unrelated to this Issue.



Also available in: Atom PDF