Idea #22529: Support the (proto)DUNE DFO Model in artdaq
Add acknowledgements to Request protocol
There should be an option in RequestReceiver and support in RequestSender for acknowledgements of requests (instead of the RequestSender owner informing RequestSender when a given sequence ID is complete). This would help make the Request protocol have the same reliability as the Routing Table protocol.
#1 Updated by Eric Flumerfelt 3 months ago
I started looking at this, and some associated changes to the Request messages, and realized that to fully implement this, RequestSender is going to have to know what ranks to expect acknowledgements from. There are several strategies for doing this, but I would like to make sure that the solution we implement will work for DUNE, where the sending ranks may come and go, and/or each request may have a different subset of senders.
#5 Updated by Kurt Biery 3 months ago
I have some questions about the RequestMessage Constructor (https://cdcvs.fnal.gov/redmine/projects/artdaq/repository/entry/artdaq/DAQrate/detail/RequestMessage.hh?rev=feature%2F22532_RequestAcknowledgements#L466)...
The first thing that I noticed was that when I tried to compile the code on this branch, is that the compiler complained about unused variables "size" and "end" in the RequestMessage Constructor. I see that these variable are used in "assert" statements, and I'm guessing that the assert statements are compiled out in profile builds, so those variable end up not being used.A few questions:
- If we add a TLOG statement when the size that is passed in is too small, is that enough? Should we return early? Throw an exception?
- is the loop on lines 480-485 missing an increment of the "ptr" variable to point to the next packet?
- if we add a simple "if" statement check on the values for ptr and end (in addition to the assert on line 482), I presume that a TLOG message would be appropriate. What else? Skip the creation of any remaining RequestPacket instances?
#6 Updated by Eric Flumerfelt 3 months ago
1. A TLOG would probably be okay, as well as returning a marked-as-invalid packet.
2. No, ptr is incremented in the RequestPacket constructor. This is bad code and should probably be changed (or at least thoroughly documented).
3. Yes, this would be at the very least a warning about mismatched received message size and contents. Once the end has been reached, any further accesses would be a segfault, so it should abort in that case.
#8 Updated by Kurt Biery 3 months ago
I've committed a new version of RequestMessage.hh that has candidate changes for the issues that we've discussed. In addition, I created a RequestMessage_t.cc unit test file to (hopefully) test the changes that I've made.
I was hoping to keep the assert statements in the RequestMessage constructor, but I didn't know how to avoid having them confuse the unit tests, so in the end, I took them out. And as we discussed, there are now TRACE error messages and appropriate internal settings that (again hopefully) accurately represent the internal state of the RequestMessage. The unit tests help to verify that.
For the incrementing of the "ptr" variable, I added an attribute to RequestPacket that will provide the size of the de-serialized message. And, I used that in decoding the full RequestMessage. This is to take into account your point that a simple sizeof(blah) call would not be sufficient for incrementing "ptr" in the RequestMessage constructor.
Please take a look, when you get a chance.
With these changes, all unit tests pass on the feature/22532_RequestAcknowledgements branch.
#11 Updated by Kurt Biery 2 months ago
Are there configuration changes that should be made to keep up with these code change(s)? I suspect not, but I wanted to ask. In an incremental test of switching a routing-master-based system to use this new code (actually to use the merged working/22535_protoDFO_testing branch), the push-mode BR complains that it doesn't have routing information. Looking at a pull-mode BR, I don't see table updates being received in the TRACE log. I'm sure that I'm missing something, so any hints that you have would be appreciated.
To be clear, the only change that I've made to a config that works with artdaq v3_05_00 is to move the routing_token_port parameter in the RM config into a token_receiver block. As I say, there may be other changes that I need to make, but I'm forgetting them.