Investigate why running event builders on dsfr[1-4] causes lost MPI buffers
In configurations with one 1495 (BoardReader running on dsfr1), five 1720s (BoardReaders running on dsfr1&2), an EventBuilder on dsfr3, and the Aggregator running on dsfr3, the system will consistently stop taking data after several hundred events.
This has been traced back to the EventBuilder process receiving empty Fragments from the MPI system. And, it appears that the empty fragments are not simply caused by spurious reports from the MPI_Waitany call in the RHandles class, but instead correspond to real fragments that are somehow lost.
This behavior does not appear to happen on the FNAL WH14 teststand, nor does it happen when we run the EventBuilders on dseb[1-5] or dsag.
[I need to provide instructions & scripts for reproducing this problem. And, I should attach some log files from when it has happened to this issue.]