Idea #22529: Support the (proto)DUNE DFO Model in artdaq
Separate Token Reception from RoutingMasterCore into a subclass
We should increase the re-usability of components with RoutingMasterCore, for potential use in a "RoutingNetOutput" module.
The TokenReceiver class may also contain the RoutingPolicy and provide an interface for retrieving destination(s).
#2 Updated by Kurt Biery 3 months ago
A question about how RoutingMasterCore should use the new TokenReceiver...
At the moment, the token reception thread is created at Init time and stopped at Shutdown time. When the state of the RoutingMaster is not Running, the thread is basically in a holding pattern, sleeping for 10 msec at a time in the receive_tokens_ method.
This is different than how CommandableFragmentGenerator uses RequestReceiver. CFG starts request reception at Start (begin-run) time and stops request reception at Stop (end-run) time.
As we move the token reception code from RMCore into a separate class, we could keep the same state behavior (start reception at Init time, etc), or we could make the new class more like RequestReceiver (start reception at Start time, etc).
Thinking about this a little more, I suspect that this difference between Request reception and Token reception is due to the different communication mechanisms used by them (UDP and TCP, respectively). Probably Token reception makes TCP connections at Init time and keeps them open until Shutdown time, since it would be more difficult to remake the TCP connections at each begin-run.
Does this sound right? If so, then I will include a method or two in the TokenReceiver interface to tell it when to actually listen for tokens (inside a run) and just sleep for a while (outside of a run).
Does this make sense?
#3 Updated by Eric Flumerfelt 3 months ago
We may have to ask John, but the other key piece of this is when the EventBuilders send their initial load of tokens, and what state the application running TokenReceiver will be in at that time. Since configuration may change between runs, there should definitely be machinery in place to tear down the listen socket and restart it during a stop run/begin run series of transitions.
#4 Updated by Kurt Biery 3 months ago
To summarize our discussion about these points...
- in the medium-to-long term, we should consider if we want TokenReceiver to start its thread at begin-run time, but for now, we will keep it at Init time.
- the Interface that we discussed for TokenReceiver was
- startTokenReception() starts the token-receiving thread, but the thread-loop code is just sleeping ("paused")
- resume() causes the thread to start actually listening for token updates
- pause() causes the thread to stop listening for token updates and go into a sleep loop
- stopTokenReception() stops the token-receiving thread
The goal is to keep the existing behavior the same, and provide some organizational changes to the TokenReceiver code to allow for changes later, if needed.
#5 Updated by Kurt Biery 3 months ago
Is there an explanation of RoutingMasterMode::RouteBySendCount anywhere?
I'm not sure why the tokens for a given rank need to be as large as the number of senders (https://cdcvs.fnal.gov/redmine/projects/artdaq/repository/revisions/develop/entry/artdaq/Application/RoutingMasterCore.cc#L652), and I'm trying to understand if I need to pass in the number of senders to TokenReceiver.
#6 Updated by Eric Flumerfelt 3 months ago
Essentially, it is used between EventBuilders and DataLoggers. The key difference is that due to event filtering, events passing between EventBuilders and DataLoggers do not necessarily have monotonically-increasing Sequence IDs. Therefore, RouteBySendCount mode allows DataSenderManager to interpret the sequence ID in the routing table as the send index. However, this will generate N senders events per table entry (as each event is discrete), so the entry should only be created once that many tokens have been received from that receiver.
Let me know if you need more explanation, it might be better if I tried to draw a diagram...
#7 Updated by Kurt Biery 3 months ago
I have committed the modified code to the feature/22530_TokenReceiver branch in the artdaq repository. This branch was based on the current state of the develop branch as of the morning of 23-May-2019.
I have also committed changes to our sample configs to correspond to this change. (The change was to move the routing_token_port parameter into a token_receiver block.) These changes were committed to a feature/22530_TokenReceiver branch in the artdaq-utilitiies-daqinterface repo. This branch was based on the develop branch in that repo as of this morning.
#8 Updated by Kurt Biery 3 months ago
- Status changed from Assigned to Resolved
I've tested this with the mediumsystem_with_routing_master and pdune_swtrig sample configurations (run_demo commands are listed below).
I compared the results of the runs with the existing development code and this new code, and the file sizes and numbers of tokens received (as reported in the log files) look consistent between the old and new code.
1002 sh ./run_demo.sh --config mediumsystem_with_routing_master --bootfile `pwd`/artdaq-utilities-daqinterface/simple_test_config/mediumsystem_with_routing_master/boot.txt --comps component01 component02 component03 component04 component05 component06 component07 component08 component09 component10 --runduration 60 --partition 5 --no_om
1003 sh ./run_demo.sh --config mediumsystem_with_routing_master --bootfile `pwd`/artdaq-utilities-daqinterface/simple_test_config/mediumsystem_with_routing_master/boot.txt --comps component01 component02 component03 component04 component05 component06 component07 component08 component09 component10 --runduration 40 --partition 5 --no_om --runs 3
1004 sh ./run_demo.sh --config pdune_swtrig --bootfile `pwd`/artdaq-utilities-daqinterface/simple_test_config/pdune_swtrig/boot.txt --comps swtrig felix01 felix02 felix03 ssp01 ssp02 ssp03 --brlist `pwd`/artdaq-utilities-daqinterface/simple_test_config/pdune_swtrig/known_boardreaders_list_example --runduration 60 --partition 5 --no_om
1005 sh ./run_demo.sh --config pdune_swtrig --bootfile `pwd`/artdaq-utilities-daqinterface/simple_test_config/pdune_swtrig/boot.txt --comps swtrig felix01 felix02 felix03 ssp01 ssp02 ssp03 --brlist `pwd`/artdaq-utilities-daqinterface/simple_test_config/pdune_swtrig/known_boardreaders_list_example --runduration 40 --partition 5 --no_om --runs 3
#9 Updated by Eric Flumerfelt 3 months ago
- Status changed from Resolved to Reviewed
- Co-Assignees Eric Flumerfelt added
Code review complete.
I have run this code in several test configurations for multiple start/stop cycles. Everything appears to run smoothly, no issues were detected, and output files have appropriate numbers of events.
Merging into develop.