Raise Visibility of tcp_receive_buffer_size parameter
On ICARUS, we have been experiencing many instances of "stalled flows", where the TCPSocketTransfer receiver does not receive any data for 10 seconds and gives up on the transfer.
We have determined that a possible culprit is the system-managed receiver buffer size, instead of using predetermined receive buffer sizes. (The default value for the tcp_receive_buffer_size parameter is 0, which enables system management.)There are a few actions we can take to help resolve/prevent this issue in the future:
- Include a reference to the configuration parameter/current value in the timeout message (on or near TCPSocket_transfer.cc:437)
- Change the default value to a reasonable default (32/64KB instead of 0)
- Add a feature to DAQInterface to book-keep the receive buffer size as it should be able to determine how many flows to/from what nodes are being requested
#1 Updated by Eric Flumerfelt about 1 month ago
I'll also note here that tcp_receive_buffer_size can be specified at the daq.<APP> level, and DataReceiverManager will pass it to all configured transfer plugins, in the same manner as the unified host_map and max_fragment_size_words. DataSenderManager does the same with the tcp_send_buffer_size (plus the host_map and max_fragment_size_words).