Project

General

Profile

Bug #20292

The recent TCPSocketTransfer change that expects data (or connections) at some frequency seems to have trouble with multiple runs in the same DAQ session

Added by Kurt Biery about 2 years ago. Updated almost 2 years ago.

Status:
Closed
Priority:
Normal
Category:
Known Issues
Target version:
Start date:
07/07/2018
Due date:
% Done:

0%

Estimated time:
Experiment:
-
Co-Assignees:
Duration:

Description

To reproduce this issue, you can use the mediumsystem_mediumdata simple_test_config on the feature/protodune branch of artdaq-utilities-daqinterface.

The commands are
  • 'cd <yourArtdaqDemoDir>'
  • 'sh ./run_demo.sh --config mediumsystem_mediumdata --bootfile `pwd`/artdaq-utilities-daqinterface/simple_test_config/mediumsystem_mediumdata/boot.txt --comps component01 component02 component03 component04 component05 component06 component07 component08 component09 component10 --runduration 240 --no_om --runs 4'

The second run will fail with messages in the EB logs like the ones listed below.

In general, it is a bad idea to base any disconnection actions on the presence of absence of data (for example, what if we get one event every half hour - that could be totally fine for a given experiment). I'm not sure whether that is the problem here, or whether it is the presence or absence of a connection that is causing a given connection to be disregarded.

%MSG-i EventBuilder2_CommandableInterface:  Starting 07-Jul-2018 20:21:34 CDT  Sequence ID 2422 Commandable.cc:67
Start transition started
%MSG
%MSG-i EventBuilder2_SharedMemoryEventManager:  Starting  07-Jul-2018 20:21:34 CDT Sequence ID 2422 SharedMemoryEventManager.cc:260
Starting art process with config file /tmp/partition_0/artConfig_11_1531012632158863.fcl
%MSG
%MSG-i EventBuilder2_SharedMemoryEventManager:  Starting  07-Jul-2018 20:21:34 CDT Sequence ID 2422 SharedMemoryEventManager.cc:292
PID of new art process is 13001
%MSG
%MSG-i EventBuilder2_SharedMemoryEventManager:  Starting  07-Jul-2018 20:21:35 CDT Sequence ID 2422 SharedMemoryEventManager.cc:373
art initialization took 1.479377 seconds.
%MSG
%MSG-i EventBuilder2_TCPConnect:  Starting 07-Jul-2018 20:21:35 CDT  Sequence ID 2422 TCPConnect.cc:191
Resolving host 227.128.12.26, on port 3001
%MSG
%MSG-i EventBuilder2_TCPConnect:  Starting 07-Jul-2018 20:21:35 CDT  Sequence ID 2422 TCPConnect.cc:99
Resolving ip mu2edaq01.fnal.gov
%MSG
%MSG-e EventBuilder2_TCPSocketTransfer:  Starting 07-Jul-2018 20:21:35 CDT  Sequence ID 2422 TCPSocket_transfer.cc:129
transfer_between_0_and_11_RECV: receiveFragmentHeader: senders have been disconnected for 12.4105 s (receive_socket_disconnected_wait_s = 10 s). RETURNING DATA_END!
%MSG
%MSG-e EventBuilder2_TCPSocketTransfer:  Starting 07-Jul-2018 20:21:35 CDT  Sequence ID 2422 TCPSocket_transfer.cc:129
transfer_between_1_and_11_RECV: receiveFragmentHeader: senders have been disconnected for 12.8949 s (receive_socket_disconnected_wait_s = 10 s). RETURNING DATA_END!
%MSG
%MSG-e EventBuilder2_TCPSocketTransfer:  Starting 07-Jul-2018 20:21:35 CDT  Sequence ID 2422 TCPSocket_transfer.cc:129
transfer_between_2_and_11_RECV: receiveFragmentHeader: senders have been disconnected for 12.895 s (receive_socket_disconnected_wait_s = 10 s). RETURNING DATA_END!
%MSG
%MSG-e EventBuilder2_TCPSocketTransfer:  Starting 07-Jul-2018 20:21:35 CDT  Sequence ID 2422 TCPSocket_transfer.cc:129
transfer_between_3_and_11_RECV: receiveFragmentHeader: senders have been disconnected for 12.895 s (receive_socket_disconnected_wait_s = 10 s). RETURNING DATA_END!
%MSG
%MSG-e EventBuilder2_TCPSocketTransfer:  Starting 07-Jul-2018 20:21:35 CDT  Sequence ID 2422 TCPSocket_transfer.cc:129
transfer_between_4_and_11_RECV: receiveFragmentHeader: senders have been disconnected for 12.8949 s (receive_socket_disconnected_wait_s = 10 s). RETURNING DATA_END!
%MSG
%MSG-e EventBuilder2_DataReceiverManager:  Starting 07-Jul-2018 20:21:35 CDT  Sequence ID 2422 DataReceiverManager.cc:193
Transfer Plugin returned DATA_END, ending receive loop!
%MSG
%MSG-e EventBuilder2_DataReceiverManager:  Starting 07-Jul-2018 20:21:35 CDT  Sequence ID 2422 DataReceiverManager.cc:193
Transfer Plugin returned DATA_END, ending receive loop!
%MSG
%MSG-e EventBuilder2_DataReceiverManager:  Starting 07-Jul-2018 20:21:35 CDT  Sequence ID 2422 DataReceiverManager.cc:193
Transfer Plugin returned DATA_END, ending receive loop!
%MSG
%MSG-e EventBuilder2_TCPSocketTransfer:  Starting 07-Jul-2018 20:21:35 CDT  Sequence ID 2422 TCPSocket_transfer.cc:129
transfer_between_5_and_11_RECV: receiveFragmentHeader: senders have been disconnected for 12.8952 s (receive_socket_disconnected_wait_s = 10 s). RETURNING DATA_END!
%MSG
%MSG-e EventBuilder2_DataReceiverManager:  Starting 07-Jul-2018 20:21:35 CDT  Sequence ID 2422 DataReceiverManager.cc:193
Transfer Plugin returned DATA_END, ending receive loop!
%MSG
%MSG-e EventBuilder2_DataReceiverManager:  Starting 07-Jul-2018 20:21:35 CDT  Sequence ID 2422 DataReceiverManager.cc:193
Transfer Plugin returned DATA_END, ending receive loop!
%MSG
%MSG-e EventBuilder2_TCPSocketTransfer:  Starting 07-Jul-2018 20:21:35 CDT  Sequence ID 2422 TCPSocket_transfer.cc:129
transfer_between_6_and_11_RECV: receiveFragmentHeader: senders have been disconnected for 12.895 s (receive_socket_disconnected_wait_s = 10 s). RETURNING DATA_END!
%MSG
%MSG-e EventBuilder2_DataReceiverManager:  Starting 07-Jul-2018 20:21:35 CDT  Sequence ID 2422 DataReceiverManager.cc:193
Transfer Plugin returned DATA_END, ending receive loop!
%MSG
%MSG-e EventBuilder2_TCPSocketTransfer:  Starting 07-Jul-2018 20:21:35 CDT  Sequence ID 2422 TCPSocket_transfer.cc:129
transfer_between_7_and_11_RECV: receiveFragmentHeader: senders have been disconnected for 12.8954 s (receive_socket_disconnected_wait_s = 10 s). RETURNING DATA_END!
%MSG
%MSG-e EventBuilder2_DataReceiverManager:  Starting 07-Jul-2018 20:21:35 CDT  Sequence ID 2422 DataReceiverManager.cc:193
Transfer Plugin returned DATA_END, ending receive loop!
%MSG
%MSG-e EventBuilder2_TCPSocketTransfer:  Starting 07-Jul-2018 20:21:35 CDT  Sequence ID 2422 TCPSocket_transfer.cc:129
transfer_between_8_and_11_RECV: receiveFragmentHeader: senders have been disconnected for 12.8954 s (receive_socket_disconnected_wait_s = 10 s). RETURNING DATA_END!
%MSG
%MSG-e EventBuilder2_DataReceiverManager:  Starting 07-Jul-2018 20:21:35 CDT  Sequence ID 2422 DataReceiverManager.cc:193
Transfer Plugin returned DATA_END, ending receive loop!
%MSG
%MSG-e EventBuilder2_TCPSocketTransfer:  InRunMap::Running  07-Jul-2018 20:21:35 CDT Sequence ID 2422 TCPSocket_transfer.cc:129
transfer_between_9_and_11_RECV: receiveFragmentHeader: senders have been disconnected for 12.8954 s (receive_socket_disconnected_wait_s = 10 s). RETURNING DATA_END!
%MSG
%MSG-i EventBuilder2_CommandableInterface:  InRunMap::Running  07-Jul-2018 20:21:35 CDT Sequence ID 2422 Commandable.cc:82
Start transition complete
%MSG
%MSG-e EventBuilder2_DataReceiverManager:  InRunMap::Running  07-Jul-2018 20:21:35 CDT Sequence ID 2422 DataReceiverManager.cc:193
Transfer Plugin returned DATA_END, ending receive loop!
%MSG
%MSG-e EventBuilder2_DataReceiverManager:  InRunMap::Running  07-Jul-2018 20:21:35 CDT Sequence ID 2422 DataReceiverManager.cc:193
Transfer Plugin returned DATA_END, ending receive loop!
%MSG
%MSG-i EventBuilder2_TCPSocketTransfer:  InRunMap::Running  07-Jul-2018 20:21:35 CDT Sequence ID 2422 TCPSocket_transfer.cc:916
[RESUMING dropped: 0] listen_: New fd is 37 for source rank 2
%MSG
%MSG-i EventBuilder2_TCPSocketTransfer:  InRunMap::Running  07-Jul-2018 20:21:35 CDT Sequence ID 2422 TCPSocket_transfer.cc:916
listen_: New fd is 39 for source rank 6
%MSG
%MSG-i EventBuilder2_TCPSocketTransfer:  InRunMap::Running  07-Jul-2018 20:21:35 CDT Sequence ID 2422 TCPSocket_transfer.cc:916
listen_: New fd is 41 for source rank 3
%MSG
%MSG-i EventBuilder2_TCPSocketTransfer:  InRunMap::Running  07-Jul-2018 20:21:35 CDT Sequence ID 2422 TCPSocket_transfer.cc:916
listen_: New fd is 43 for source rank 0
%MSG
%MSG-i EventBuilder2_TCPSocketTransfer:  InRunMap::Running  07-Jul-2018 20:21:35 CDT Sequence ID 2422 TCPSocket_transfer.cc:916
listen_: New fd is 45 for source rank 9
%MSG
%MSG-i EventBuilder2_TCPSocketTransfer:  InRunMap::Running  07-Jul-2018 20:21:35 CDT Sequence ID 2422 TCPSocket_transfer.cc:916
listen_: New fd is 47 for source rank 8
%MSG
%MSG-i EventBuilder2_TCPSocketTransfer:  InRunMap::Running  07-Jul-2018 20:21:35 CDT Sequence ID 2422 TCPSocket_transfer.cc:916
listen_: New fd is 51 for source rank 4
%MSG
%MSG-i EventBuilder2_TCPSocketTransfer:  InRunMap::Running  07-Jul-2018 20:21:35 CDT Sequence ID 2422 TCPSocket_transfer.cc:916
listen_: New fd is 55 for source rank 1
%MSG
%MSG-i EventBuilder2_TCPSocketTransfer:  InRunMap::Running  07-Jul-2018 20:21:35 CDT Sequence ID 2422 TCPSocket_transfer.cc:916
listen_: New fd is 56 for source rank 7
%MSG
%MSG-i EventBuilder2_TCPSocketTransfer:  InRunMap::Running  07-Jul-2018 20:21:35 CDT Sequence ID 2422 TCPSocket_transfer.cc:916
listen_: New fd is 58 for source rank 5
%MSG
%MSG-i EventBuilder2_CommandableInterface:  Stopping 07-Jul-2018 20:25:39 CDT  Sequence ID 2422 Commandable.cc:94
Stop transition started
%MSG
%MSG-i EventBuilder2_SharedMemoryEventManager:  Stopping  07-Jul-2018 20:25:39 CDT Sequence ID 2422 SharedMemoryEventManager.cc:677
Ending subrun 1
%MSG
%MSG-i EventBuilder2_SharedMemoryEventManager:  Stopping  07-Jul-2018 20:25:39 CDT Sequence ID 2422 SharedMemoryEventManager.cc:691
Subrun 1 in run 110 has ended. There were 0 events in this subrun.
%MSG
%MSG-i EventBuilder2_SharedMemoryEventManager:  Stopping  07-Jul-2018 20:25:39 CDT Sequence ID 2422 SharedMemoryEventManager.cc:658
Ending run 110
%MSG
%MSG-i EventBuilder2_SharedMemoryEventManager:  Stopping  07-Jul-2018 20:25:39 CDT Sequence ID 2422 SharedMemoryEventManager.cc:669
Run 110 has ended. There were 0 events in this run.
%MSG
%MSG-i EventBuilder2_SharedMemoryEventManager:  Stopping  07-Jul-2018 20:25:40 CDT Sequence ID 2422 SharedMemoryEventManager.cc:296
Removing PID 13001 from process list
%MSG
%MSG-i EventBuilder2_SharedMemoryEventManager:  Stopping  07-Jul-2018 20:25:40 CDT Sequence ID 2422 SharedMemoryEventManager.cc:304
art process 13001 exited normally, not restarting
%MSG
%MSG-i EventBuilder2_SharedMemoryEventManager:  Stopping  07-Jul-2018 20:25:40 CDT Sequence ID 2422 SharedMemoryEventManager.cc:597
It took 0.71419 s for all art processes to close after sending EndOfData Fragment
%MSG
%MSG-i EventBuilder2_RequestSender:  Stopping 07-Jul-2018 20:25:40 CDT  Sequence ID 2422 RequestSender.cc:46
Shutting down RequestSender: Waiting for requests to be sent
%MSG
%MSG-i EventBuilder2_RequestSender:  Stopping 07-Jul-2018 20:25:40 CDT  Sequence ID 2422 RequestSender.cc:54
Shutting down RequestSender
%MSG
%MSG-i EventBuilder2_SharedMemoryEventManager:  Stopping  07-Jul-2018 20:25:40 CDT Sequence ID 2422 SharedMemoryEventManager.cc:617
EndOfData Complete. There were 808 buffers processed.
%MSG
%MSG-i EventBuilder2_CommandableInterface:  InitializedMap::Ready  07-Jul-2018 20:25:40 CDT Sequence ID 2422 Commandable.cc:109
Stop transition complete

History

#1 Updated by Eric Flumerfelt about 2 years ago

I have made a commit which comments out the code in question while we figure out the best thing to do here.

Tests with transfer_driver show that it is advantageous to have a mechanism whereby transfer plugins can notify DataReceiverManager that no more data is forthcoming, however we should be careful to avoid unintended side effects such as observed here.

#2 Updated by Eric Flumerfelt almost 2 years ago

  • Category set to Known Issues
  • Status changed from New to Closed
  • Assignee set to Eric Flumerfelt
  • Target version set to artdaq v3_03_00


Also available in: Atom PDF