Bug #20976
multiple_art_processes_example broken
0%
Description
Using artdaq v3_03_00.
During the multiple-run test of multiple_art_processes_example, the following messages were seen:
2018-09-28 13:05:36 -0500: %MSG-i component02_CommandableInterface: Early pre-events Commandable.cc:94 2018-09-28 13:05:36 -0500: Stop transition started 2018-09-28 13:05:36 -0500: %MSG 2018-09-28 13:05:36 -0500: %MSG-i component01_CommandableInterface: Early pre-events Commandable.cc:94 2018-09-28 13:05:36 -0500: Stop transition started 2018-09-28 13:05:36 -0500: %MSG 2018-09-28 13:05:36 -0500: %MSG-i component02_BoardReaderCore: Early pre-events BoardReaderCore.cc:546 2018-09-28 13:05:36 -0500: Stopping run 22 after 75 fragments. 2018-09-28 13:05:36 -0500: %MSG 2018-09-28 13:05:36 -0500: %MSG-i component02_BoardReaderCore: Early pre-events BoardReaderCore.cc:546 2018-09-28 13:05:36 -0500: Completed the Stop transition for run 22 2018-09-28 13:05:36 -0500: %MSG 2018-09-28 13:05:36 -0500: %MSG-i component01_BoardReaderCore: Early pre-events BoardReaderCore.cc:546 2018-09-28 13:05:36 -0500: Stopping run 22 after 75 fragments. 2018-09-28 13:05:36 -0500: %MSG 2018-09-28 13:05:36 -0500: %MSG-i component01_BoardReaderCore: Early pre-events BoardReaderCore.cc:546 2018-09-28 13:05:36 -0500: Completed the Stop transition for run 22 2018-09-28 13:05:36 -0500: %MSG 2018-09-28 13:06:28 -0500: %MSG-w SharedMemoryManager: Early pre-events SharedMemoryManager.cc:766 2018-09-28 13:06:28 -0500: Stale Read buffer 3 at 0x7facca1a70b8 ( 100000576 / 100000000 us ) detected! (seqid=294) Resetting... Reading-->Full 2018-09-28 13:06:28 -0500: %MSG 2018-09-28 13:06:28 -0500: %MSG-w SharedMemoryManager: Early pre-events SharedMemoryManager.cc:766 2018-09-28 13:06:28 -0500: Stale Read buffer 3 at 0x7f52c9ee70b8 ( 100000535 / 100000000 us ) detected! (seqid=294) Resetting... Reading-->Full 2018-09-28 13:06:28 -0500: %MSG 2018-09-28 13:06:37 -0500: %MSG-i EventBuilder2_CommandableInterface: Early pre-events Commandable.cc:94 2018-09-28 13:06:37 -0500: Stop transition started 2018-09-28 13:06:37 -0500: %MSG
DAQInterface failed to stop the second run with a timeout sending stop transition to component01 message.
Related issues
History
#1 Updated by Eric Flumerfelt over 2 years ago
Command line was:
killall -9 art;treset;ipcrm -a;reset;./run_demo.sh --config multiple_art_processes_example --comps component{01..02} -- --runs 4 --runduration 50
on ironwork
#2 Updated by Eric Flumerfelt over 2 years ago
The example works when DAQInterface is forced to use TCPSocket transfers.
#3 Updated by Eric Flumerfelt over 2 years ago
- Related to Bug #21075: Broadcast Buffers reset before seen by process added
#4 Updated by Eric Flumerfelt over 2 years ago
- Related to Bug #21077: SharedMemoryFragmentManager issues with multiple readers added
#5 Updated by Eric Flumerfelt over 2 years ago
- Status changed from New to Resolved
- Parent task set to #21075
I merged artdaq_core/feature/SMM_DontResetUnseenBroadcasts and artdaq-core/feature/SharedMemoryReader_GetBufferForReading_LimitedRetries into working/Issue20976, and was able to run the example successfully using ShmemTransfer.
#6 Updated by Eric Flumerfelt over 2 years ago
- Parent task deleted (
#21075)
#7 Updated by Eric Flumerfelt over 2 years ago
- Related to Bug #21075: Broadcast Buffers reset before seen by process added
#8 Updated by Eric Flumerfelt over 2 years ago
- Target version set to artdaq_core v3_04_03
#9 Updated by John Freeman over 2 years ago
- Status changed from Resolved to Work in progress
It seems the stale read buffer problem still crops up, despite my using the head of release/v3_04_03 (b54b523f5a50d132b54774a9440124aa40365ca2) in artdaq-core. Details are as follows:
- The installation I'm working with is on woof, in /home/jcfree/scratch/artdaq-demo_test_artdaq-core_shmem; the installation was performed using the quick-mrb-start.sh script in that directory, which I modified to use release branches where possible.
- I used variants of the command listed above (the one which begins with
killall
), where the variants in practice meant that I used the--no_om
argument to take online monitoring out of the picture, and also used--runs 10
at one point rather than--runs 4
- I saw
Stale Read buffer
warnings, accompanied by timeouts on DAQInterface's stop transition, for runs 16 and 22 (details can be found in the run records directory, /home/jcfree/scratch/artdaq-demo_test_artdaq-core_shmem/run_records)
#10 Updated by John Freeman over 2 years ago
The problem seems to persist. I've tested out changes made to artdaq-core and artdaq in the last day. Specifically, for artdaq-core, commit 914b0221c0f3b1b5021569bdd95841a2df07ad65 at the HEAD of release/v3_04_03_WithHotfixes and for artdaq, commit f92af7c5d2bdb442e620b96a2722e2a99d2c5606 at the HEAD of release/v3_03_01_WithHotfixes. Details:
- Installation area used for the tests in this entry is woof:/home/jcfree/scratch/artdaq-demo_test_artdaq-core_shmem_try2; installation performed by quick-mrb-start.sh in the directory.
- Runs 10 and 13 both had messages of the form
2018-10-17 15:58:00 -0500: %MSG-w SharedMemoryManager: Early pre-events SharedMemoryManager.cc:800 2018-10-17 15:58:00 -0500: Stale Read buffer 1 at 0x7fe9b3b2e068 ( 100000288 / 100000000 us ) detected! (seqid=582) Resetting... Reading-->Full
resulting in a timeout on the stop transition sent to the boardreaders.
- Further info can be found in the run records directory, /home/jcfree/scratch/artdaq-demo_test_artdaq-core_shmem_try2/run_records
#11 Updated by Eric Flumerfelt about 2 years ago
- Status changed from Work in progress to Closed