The use of a RoutingMaster in an artdaq system with subsystems seems to need special handling for the BinaryNetOutput(s)
In order to help guide my thinking about the possible use of artdaq processes for data selection tasks in the DUNE DAQ, I created a sample configuration called "dune_sample_system" that has been committed to the "develop" branch of artdaq-utilities-daqinterface.
This sample system has the layout shown in the attached diagram. (This diagram is also included in the sample config subdirectory in the repo.)
When I tried to run this system config in an artdaq-demo system, many things just worked.
However, I noticed that the RoutingMaster port(s) and multicast address for the BinaryNetOutput module in the DFO (the EventBuilderMain process in subsystem 3) used an offset that was suitable for subsystem 3, but an offset that was suitable for subsystem 4 is what was needed (since the DFO sends data to subsystem 4).
In order to get this sample config to work, I hacked the bookkeeping.py file to add one to the port and multicast address when the process label is "DFO". Clearly something more robust is needed.
These notes are somewhat specific to a BinaryNetOutput module that is sending data to a subsystem that is using a RoutingMaster, but maybe there is a need to consider other types of "output" bookkeeping compared to "input" bookkeeping.
#2 Updated by Kurt Biery 8 months ago
I wanted to save links to the subsystem documentation that I've found, just for future reference:
#3 Updated by Kurt Biery 7 months ago
I've committed another modification to bookkeeping.py on this branch.
When running a system with the pdune_swtrig sample config, I noticed that the RoutingMaster was complaining bitterly about an unexpected rank sending it routing_table_update acknowledgements. It turned out to be the DFO in that sample config that was sending the ACKs. It should be doing this, and the problem was just that the RM had not been told to expect that in its 'sender_ranks' list. My hack was to add the DFO to the sender_ranks list in bookkeeping.py.
#4 Updated by John Freeman 7 months ago
- % Done changed from 0 to 100
- Status changed from New to Resolved
Issue is ready for review.
I consider this issue to be resolved with commit 3a972492fbe2bb0918e4998e7e6622f931e6ba8e on the feature/22255_Subsystem_RMFixes branch. It's now the case that without any special add-ons (like checking if "DFO" is in a process's label), DAQInterface will make sure that if there's a parent subsystem with eventbuilders which send data to the eventbuilders in a child subsystem, and there's a routing_master in the child subsystem, that the eventbuilders in both the parent and child subsystem know to use the same the routing_master when transferring data (i.e., they use the same values for parameters like table_update_address, routing_token_port, etc.)
I've checked that this new commit can replicate the FHiCL documents created with an earlier commit on this branch before I made the changes (specifically 5f6fdf450e27a0b58e808ac19c68d7a707ce9ac1); you can see this by comparing mu2edaq01:/home/jcfree/run_records/2543 and 2552 (config dune_sample_system), and then also mu2edaq01:/home/jcfree/run_records/2544 and 2553 (config pdune_swtrig). The only differences is that with my code, "localhost" became "mu2edaq01.fnal.gov" for routing_master_hostname; also in the later runs I flipped off an unnecessary use of adding _dl to the output root file name.