Project

General

Profile

Bug #24072

Occasional issue running ascii_simulator_example for multiple runs on multiple mu2edaq machines

Added by Eric Flumerfelt 5 months ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
Start date:
02/20/2020
Due date:
% Done:

0%

Estimated time:
Experiment:
-
Co-Assignees:
Duration:

Description

When running the integration test suite, I've noticed a few times that the ASCII simulator example will fail to run multiple times cleanly on the first attempt across multiple mu2edaq nodes. The DataLogger prints the following log messages:
%MSG-i DataLogger1_CommandableInterface: InRunMap::Running 12-Feb-2020 09:20:40 CST Sequence ID 27631 Commandable.cc:82
Start transition complete
%MSG
%MSG-i DataLogger1_TCPSocketTransfer: InRunMap::Running 12-Feb-2020 09:20:41 CST Sequence ID 27631 TCPSocket_transfer.cc:1080
listen_: New fd is 13 for source rank 1
%MSG
%MSG-w DataLogger1_TCPSocketTransfer: InRunMap::Running 12-Feb-2020 09:20:41 CST Sequence ID 27631 TCPSocket_transfer.cc:307
transfer_between_1_and_3_RECV: receiveFragmentHeader: Error on receive, closing socket (errno=9: Bad file descriptor)
%MSG
%MSG-w DataLogger1_TCPSocketTransfer: InRunMap::Running 12-Feb-2020 09:20:41 CST Sequence ID 27631 TCPSocket_transfer.cc:377
transfer_between_1_and_3_RECV: disconnect_receive_socket_: Closing socket 14
%MSG
%MSG-i DataLogger1_TCPSocketTransfer: InRunMap::Running 12-Feb-2020 09:20:41 CST Sequence ID 27631 TCPSocket_transfer.cc:1080
listen_: New fd is 14 for source rank 2
%MSG
%MSG-w DataLogger1_TCPSocketTransfer: InRunMap::Running 12-Feb-2020 09:20:41 CST Sequence ID 27631 TCPSocket_transfer.cc:307
transfer_between_1_and_3_RECV: receiveFragmentHeader: Error on receive, closing socket (errno=14: Bad address)
%MSG
%MSG-w DataLogger1_TCPSocketTransfer: InRunMap::Running 12-Feb-2020 09:20:41 CST Sequence ID 4294967296 TCPSocket_transfer.cc:377
transfer_between_1_and_3_RECV: disconnect_receive_socket_: Closing socket 13
%MSG
%MSG-e SharedMemoryManager: Early 12-Feb-2020 09:20:41 CST pre-events SharedMemoryManager.cc:732
Requested write size is larger than the buffer size! (sz=1000000, cur + req=20286540344)
%MSG
%MSG-e DataLogger1_SharedMemoryEventManager: InRunMap::Running 12-Feb-2020 09:20:41 CST Sequence ID 4294967296 SharedMemoryEventManager.cc:236
Dropping over-size fragment with sequence id 4294967296 and fragment id 0 because there is no room in the current buffer for this Fragment! (Keeping header)
%MSG

Meanwhile, both EventBuilder art processes log the following:
%MSG-i EventBuilder1_art1_AutodetectTransfer: DAQ 12-Feb-2020 09:20:41 CST Booted Autodetect_transfer.cc:116
transfer_between_1_and_3_SEND: Constructing TCPSocketTransfer
%MSG
%MSG-i PortManager: DAQ 12-Feb-2020 09:20:41 CST Booted PortManager.cc:300
Using default port range for TCPSocket Transfer
%MSG
%MSG-i EventBuilder1_art1_TCPSocketTransfer: DAQ 12-Feb-2020 09:20:41 CST Booted TCPSocket_transfer.cc:891
transfer_between_1_and_3_SEND: connect_: Successfully connected
%MSG
%MSG-i EventBuilder1_art1_DataSenderManager: DAQ 12-Feb-2020 09:20:41 CST Booted DataSenderManager.cc:107
enabled_destinations not specified, assuming all destinations enabled.
%MSG
%MSG-w EventBuilder1_art1_TCPSocketTransfer: DAQ 12-Feb-2020 09:20:41 CST Booted TCPSocket_transfer.cc:751
transfer_between_1_and_3_SEND: sendFragment_: WRITE ERROR 104: Connection reset by peer
%MSG
%MSG-e EventBuilder1_art1_DataSenderManager: DAQ 12-Feb-2020 09:20:41 CST Booted DataSenderManager.cc:559
sendFragment: Sending fragment 2 to destination 3 failed! Data has been lost!
%MSG
%MSG-i EventBuilder1_art1_TCPSocketTransfer: DAQ 12-Feb-2020 09:20:41 CST Booted TCPSocket_transfer.cc:891
transfer_between_1_and_3_SEND: connect_: Successfully connected
%MSG
%MSG-w EventBuilder1_art1_TCPSocketTransfer: DAQ 12-Feb-2020 09:20:45 CST Booted TCPSocket_transfer.cc:751
transfer_between_1_and_3_SEND: sendFragment_: WRITE ERROR 104: Connection reset by peer
%MSG
%MSG-e EventBuilder1_art1_DataSenderManager: DAQ 12-Feb-2020 09:20:45 CST Booted DataSenderManager.cc:559
sendFragment: Sending fragment 6868 to destination 3 failed! Data has been lost!
%MSG
%MSG-i EventBuilder1_art1_TCPSocketTransfer: DAQ 12-Feb-2020 09:20:45 CST Booted TCPSocket_transfer.cc:611
transfer_between_1_and_3_SEND: reconnection attempt failed, returning quickly.
%MSG



Also available in: Atom PDF