It takes a long time to end a long run
I just ran a test in which I stopped a 2x2x2 run on ds50ws after 60,000 events (as reported by the Aggregator.
The "manage2x2x2System.sh stop" command successfully sent the stop command to the two BoardReaders, but the sending of the stop command to the EBs and the AGs timed out.
When I looked in the BoardReader logs, it looks like many more fragments were generated than 60000 (more like 300,000).
The main system is still running after 78000 events (as shown by the PMT console and log). It's as though the BRs were generating fragments really fast and filling up various buffers in the system. However, I don't expect there to be this much buffering.
I've seen something similar in ds50daq, but we should look into this for artdaq-demo since it is very likely that one of our experiment collaborators will notice this too and ask about it.
This test was done with artdaq-demo v2_00_01.
#1 Updated by Kurt Biery over 6 years ago
This seems to be a problem (bug?) in the ethernet MPI implementation. When the problem happens, the memory usage in the EventBuilders increases significantly, but when I added diagnostics to the EventStore code, I found that it was not in the EventStore or the ConcurrentQueue that the memory was being consumed. In fact, the EventBuilder statistics show many fewer events being received than have been sent by the BoardReaders.