Project

General

Profile

Feature #22882

online monitor connection robustness

Added by Ron Rechenmacher 7 months ago. Updated 20 days ago.

Status:
Reviewed
Priority:
Normal
Category:
-
Target version:
-
Start date:
07/09/2019
Due date:
% Done:

0%

Estimated time:
Experiment:
-
Co-Assignees:
Duration:

Description

Currently - online monitoring retries 3 times with .1 s inbetween...
I would like to see online monitoring to be able to be started first and/or
persist across multiple runs.

History

#1 Updated by Eric Flumerfelt 7 months ago

  • Assignee set to Eric Flumerfelt
  • Status changed from New to Work in progress

I've started to make some changes to TransferWrapper on artdaq:feature/22882_TransferWrapper_ConnectionRobustness. I would like to get this code to a point where online monitors can be started at any time with respect to the DAQ and be able to connect and disconnect at will.

#2 Updated by Eric Flumerfelt 6 months ago

  • Status changed from Work in progress to Resolved

With the addition of a "allowMultipleRuns" parameter and a "dispatcherConnectTimeout" parameter, I have improved TransferWrapper so that it can wait up to a user-specified amount of time for the Dispatcher to enter the "Running" state (which it determines by querying the Dispatcher's status), polling for the Dispatcher's status every "dispatcherConnectRetryInterval_us" (defaults to 1 second, between 1000 us and 30 seconds). If allowMultipleRuns is false, the online monitor will terminate when it receives an EndOfData Fragment (current behavior), otherwise, it will re-enter the Dispatcher connection routine.

#3 Updated by John Freeman 20 days ago

  • Status changed from Resolved to Reviewed

Things look to be working as they should. I'll offer a couple of comments about the output, but first, to describe exactly what I did to test:

  • While the DAQ was in the config state prior to run 3403 (mu2edaqXX:/home/jcfree/run_records/3403) I successfully registered an online monitor using TransferInputShmem.fcl with only the dispatcher port edited
  • In the middle of run 3403, I successfully deregistered the monitor
  • After hitting stop and returning the DAQ to the ready state, I added allowMultipleRuns: true to TransferInputShmem.fcl, and successfully registered the monitor before starting run 3404
  • I issue a stop to run 3404, then see a few ominous messages (more on these below) but then ToyDump results start appearing again as you'd hope after I start run 3405
  • I stop the run so we're back in "ready", and successfully deregister the monitor
  • Later, during run 3407, I run the monitor with allowMultipleRuns: false, and sure enough it exits out when I stop the run

Now, just a couple of comments about the output. After the stop of run 3404, I saw these warnings appear a few times, even though as I mentioned functionally things seemed fine:

%MSG-i TransferWrapper:  DAQ 07-Jan-2020 15:58:02 CST Booted TransferWrapper.cc:340
Requesting that this monitor (shmem1) be unregistered from the dispatcher aggregator
%MSG
%MSG-i TransferWrapper:  DAQ 07-Jan-2020 15:58:02 CST Booted TransferWrapper.cc:345
Response from dispatcher is "Warning in DispatcherCore::unregister_monitor: unable to find requested transfer plugin with label "shmem1"" 
%MSG
%MSG-w TransferWrapper:  DAQ 07-Jan-2020 15:58:02 CST Booted TransferWrapper.cc:355
The Dispatcher returned status Warning in DispatcherCore::unregister_monitor: unable to find requested transfer plugin with label "shmem1" when attempting to unregister this monitor!
%MSG

Also, since the default dispatcherConnectTimeout value is zero, and zero is interpreted as "infinity timeout", the message you see when the DAQ's in the ready state and you launch a monitor is a little confusing:

%MSG-i TransferWrapper:  DAQ 07-Jan-2020 15:59:16 CST Booted TransferWrapper.cc:286
Waited 4.01 s / 0.00 s for Dispatcher to enter the Running state

...while it's obviously less confusing when the timeout's nonzero, i.e., not infinity.



Also available in: Atom PDF