Project

General

Profile

Support #3936

Investigate why running event builders on dsfr[1-4] causes lost MPI buffers

Added by Kurt Biery over 6 years ago. Updated over 5 years ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Target version:
-
Start date:
05/27/2013
Due date:
06/28/2013
% Done:

0%

Estimated time:
Duration: 33

Description

In configurations with one 1495 (BoardReader running on dsfr1), five 1720s (BoardReaders running on dsfr1&2), an EventBuilder on dsfr3, and the Aggregator running on dsfr3, the system will consistently stop taking data after several hundred events.

This has been traced back to the EventBuilder process receiving empty Fragments from the MPI system. And, it appears that the empty fragments are not simply caused by spurious reports from the MPI_Waitany call in the RHandles class, but instead correspond to real fragments that are somehow lost.

This behavior does not appear to happen on the FNAL WH14 teststand, nor does it happen when we run the EventBuilders on dseb[1-5] or dsag.

[I need to provide instructions & scripts for reproducing this problem. And, I should attach some log files from when it has happened to this issue.]

--Kurt

History

#1 Updated by Kurt Biery over 6 years ago

  • Assignee deleted (Kurt Biery)

#2 Updated by Kurt Biery over 5 years ago

  • Status changed from New to Closed

I expect that this is no longer an issue, given the changes/improvements that have happened in the meantime.



Also available in: Atom PDF