Notes on getting single-node, private-network multicasts to work on SBN DAQ computers
This Issue is intended to hold notes on tests that were done as part of getting multicasts to work on SBN DAQ computers, along with the configuration and code changes that were made and questions that arose.
The basic premise is that we should be able to run the artdaq-demo with the mediumsystem_with_routing_master configuration on DAQ nodes at the ICARUS building (e.g. icarus-vst01, icarus-gateway02, etc.) and at DAB (e.g. sbnd-daq32, sbnd-daq33, etc.) [Recall that the use of the RoutingMaster includes the sending of multicast messages for Routing Table Updates. The sending of Request Messages from the EBs to BRs in pull mode also uses multicast.]
This did not work initially, and we discovered that turning off the SLAM firewall allowed these tests to work (of course that is not a long-term solution). It was realized that we were sending the multicasts over the public network interface on these computers, by default, and the consensus that came out of various discussions (I believe) was that we should send those over the private-network interfaces.
In some cases, updates to the firewall rules were needed. In other cases, additional networking was needed (e.g. private-network connections at DAB). And, it turned out that some code and configuration changes were needed to get this to work.
#1 Updated by Kurt Biery almost 2 years ago
- sh ./run_demo.sh --config mediumsystem_with_routing_master --bootfile `pwd`/artdaq-utilities-daqinterface/simple_test_config/mediumsystem_with_routing_master/boot.txt --comps component01 component02 component03 component04 component05 component06 component07 component08 component09 component10 --runduration 40 --no_om --partition=3
For reference, the IP address of icarus-vst01 on the private network is 192.168.184.103 and the hostname that corresponds to that IP address is icarus-vst01-daq.
One simple approach that I took was to change the hostname of the RoutingMaster in the mediumsystem_with_routing_master/boot.txt file from "localhost" to "icarus-vst01-daq". This turned out to be very useful, but it didn't provide everything that was needed to get the system (with multicasts) to work. However, initially that change caused more problems because DAQInterface tried to use that hostname when launching the RoutingMaster process. Pat Riehecky reminded me that creating an SSH key would allow me (and DAQInterface running from my account) to ssh between nodes (and to the current node using a specific network interface) with needing to enter a password. After creating the SSH key and installing it in my authorized_keys file, DAQInterface was able to start the RM process with the RM hostname set to icarus-vst01-daq.
With this change, the "routing_master_hostname" parameters in all of the BR, EB, and RM config files were book-kept to have a value of "icarus-vst01-daq". This is great, but of course, the routing_master_hostname parameter is only used for multicasts in the RM config; it isn't used as part of sending and receiving multicasts in the BR and EB configs.
#2 Updated by Kurt Biery almost 2 years ago
With the change of the RM hostname in the boot.txt file, the BoardReaders were still not receiving routing table updates from the RM, so I tried adding a multicast interface hostname/address to the receiving of multicasts in the BRs (specifically in DataSenderManager).This change consisted of:
- adding a table_update_multicast_interface parameter to the list of config parameters that the DSM class supports
- adding a table_multicast_interface_ data member to the DSM class
- adding code to DataSenderManager::setupTableListener_ to make use of this interface parameter when joining the multicast group
- adding the table_update_multicast_interface parameter to the BoardReader configuration files (in the routing_table_config block). Ssince DAQInterface doesn't know about this new parameter, it isn't be book-kept, so I needed to set its value to icarus-vst01-daq for these tests.
With this change, the BRs successfully started receiving the routing table updates via multicast over the private-network interface.
The code changes mentioned here were committed to the feature/MulticastMinorTweaks branch in the artdaq repo. (Alas, I probably should have created a new branch, but I just continued using the branch that was already in progress for SBN multicast work.)
Question: why does the receiving multicast interface need to be specified. The existing code in DataSenderManager was specifying INADDR_ANY for the interface. Would we expect that ANY wouldn't work?
#3 Updated by Kurt Biery almost 2 years ago
At this point, the demo with mediumsystem_with_routing_master seemed to work fine, but the truth was that I was just lucky. The multicasting of Request messages from the EBs to the BRs were going over the public (131.225) interface using port 3001 (which our SLAM colleagues had enabled for multicasts on the public network.
So, the next step was to specify the output multicast interface in the EventBuilder configuration files (to tell the RequestSender code to use a specific interface). There was already a configuration parameter for this, so this change simply consisted of adding "multicast_interface_ip" to the EB config files.
It turns out that this parameter is not book-kept, so the value for that parameter in the EB config files needed to be set to icarus-vst01-daq.
At this point, the Request Message multicasts were correctly going out over the private network interface, but they weren't being received by the BRs.
The next step was to specify the Request Message receive multicast interface in the BR configs. This consisted of setting the already-existing multicast_interface_ip parameter. It is not book-kept, so it's value needed to be icarus-vst01-daq.
With this change, Request Message multicasts were successfully being sent over the private network.
These configuration changes in the mediumsystem_with_routing_master sample config were made on the feature/Issue21769_SBN_Multicast_Tests branch in the artdaq-utilities-daqinterface repo.
As before, why do we need to specify the receive multicast interface?
#5 Updated by Kurt Biery almost 2 years ago
- is it expected that we need to specify the send and receive multicast interface addresses?
- there are sample config file changes on the feature/Issue21769_SBN_Multicast_Tests branch in the artdaq-utilities-daqinterface repo, but these are currently host-specific since the new params are not yet book-kept
- we should talk about adding book-keeping, if that is possible, for the new params
- the code changes on the feature/MulticastMinorTweaks branch in the artdaq repo are needed in order for us to successfully send multicasts over config-specified private-network interfaces. There are also some housekeeping changes on that branch, like the addition of parameters to TRACE statements and the renaming of data members to make their use more clear
- including the artdaq feature/MulticastMinorTweaks branch in your software area
- including the artdaq-utilities-daqinterface feature/Issue21769_SBN_Multicast_Tests branch in your artdaq-demo work area
- changing "icarus-vst01-daq" in the mediumsystem_with_routing_master config files to the private-network hostname of the computer where you want to run the tests
#6 Updated by Eric Flumerfelt almost 2 years ago
From a quick search, it seems like both the send and receive multicast sockets must specify an interface or be routed to the default.
There are several methods which have been added to artdaq but are not currently in use which could help with this Issue. TCPConnect::AutodetectPrivateInterface will search for an interface with a private IP, which can then be used for the multicast interface address. Similarly, the PortManager class has a set of methods designed to abstract the choosing of interfaces and ports, with user overrides available in the case where the automatic configuration is not acceptable.