Project

General

Profile

Idea #5986

Investigate whether we can support graceful loss of a small number of EventBuilders

Added by Kurt Biery over 5 years ago. Updated 8 months ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Category:
Additional Functionality
Target version:
-
Start date:
04/21/2014
Due date:
% Done:

0%

Estimated time:
32.00 h
Experiment:
Duration:

Description

In our current use of mpirun, if one DAQ process dies, there is a very good chance that the full MPI program will shut down.

We should investigate if it is possible to have a small number of EventBuilder processes to die and continue running. (Here "running" means "take data" or "continue the current run in progress".)

If we determine that this is possible, then it will entail modifying the MPI program so that individual process failures do not bring the full system down And implementing a way to tell the BoardReaders that an EventBuilder has died and they should no longer send Fragments to that EB.

I'm going to only estimate the investigation portion of this task, for now. If we decide to go ahead with an implementation, then that work will need to be added, either in this Issue or an additional one.


Related issues

Related to ds50daq - Idea #3980: Investigate if the MPI program can be run in such a way that the loss of a single process doesn't stop the whole programClosed06/04/2013

History

#1 Updated by Eric Flumerfelt almost 3 years ago

  • Category set to Additional Functionality
  • Target version deleted (577)

The TransferPlugin-based data transfer work opens the door on this a bit. Still needed: a Routing Master, removal of remaining MPI calls in application.

#2 Updated by Eric Flumerfelt 8 months ago

  • Status changed from New to Closed

This has been shown to work at protoDUNE, using artdaq v3_04_00. Other issues document refinements, e.g. #22061.



Also available in: Atom PDF