Project

General

Profile

Bug #24242

Dispatcher responds "busy" in subrun_example

Added by Eric Flumerfelt 8 months ago. Updated about 1 month ago.

Status:
Closed
Priority:
Normal
Category:
Known Issues
Target version:
Start date:
03/27/2020
Due date:
% Done:

0%

Estimated time:
Experiment:
-
Co-Assignees:
Duration:

Description

During testing of artdaq v3_08_00, I have persistently encountered an issue where the subrun_example fails to complete multiple runs when the online monitors are enabled. Debugging has led me to believe that the issue is because the Dispatcher art processes are not shutting down in a timely fashion upon receipt of the "unregister_monitor" command from the online monitors, which are sent at the end of the first subrun (A separate issue...). The unregister_monitor command hangs until the end of the run shuts down the Dispatcher art process, and during this time, the Dispatcher is incapable of receiving additional commands.

My current workaround is to add a flag to the SharedMemoryEventManager::ShutdownArtProcesses call that instructs SMEM to skip the "graceful" shutdown period, as unregister_monitor should simply kill the Dispatcher art process without regard to whether it cleanly shuts down. (Sending an EndOfData Fragment would have the side effect of halting any other connected Dispatcher art processes, which is undesirable.) We can discuss whether we should try and suppress error/warning messages in this situation.


Related issues

Related to artdaq - Bug #24249: Online Monitor art processes disconnect after first subrun in subrun_exampleNew03/30/2020

Related to artdaq - Bug #24262: DataLogger stop transitions take a **long** time in v3_08_00Closed04/02/2020

History

#1 Updated by Eric Flumerfelt 8 months ago

Implementation on artdaq:bugfix/24242_SMEM_skip_graceful_wait_for_Dispatcher

#2 Updated by Eric Flumerfelt 8 months ago

  • Related to Bug #24249: Online Monitor art processes disconnect after first subrun in subrun_example added

#3 Updated by Eric Flumerfelt 8 months ago

  • Related to Bug #24262: DataLogger stop transitions take a **long** time in v3_08_00 added

#4 Updated by Gennadiy Lukhanin about 1 month ago

  • Status changed from New to Resolved

Reviewed the source code.
We tested this branch on the Icarus cluster as a part of new release tests for artdaq v3_09_01 and sbndaq v0_07_01.

#5 Updated by Eric Flumerfelt about 1 month ago

  • Target version set to artdaq v3_09_02
  • Status changed from Resolved to Closed

Also available in: Atom PDF