Feature #22882

online monitor connection robustness

Added by Ron Rechenmacher over 1 year ago. Updated 10 months ago.

Target version:
Start date:
Due date:
% Done:


Estimated time:


Currently - online monitoring retries 3 times with .1 s inbetween...
I would like to see online monitoring to be able to be started first and/or
persist across multiple runs.


#1 Updated by Eric Flumerfelt over 1 year ago

  • Assignee set to Eric Flumerfelt
  • Status changed from New to Work in progress

I've started to make some changes to TransferWrapper on artdaq:feature/22882_TransferWrapper_ConnectionRobustness. I would like to get this code to a point where online monitors can be started at any time with respect to the DAQ and be able to connect and disconnect at will.

#2 Updated by Eric Flumerfelt over 1 year ago

  • Status changed from Work in progress to Resolved

With the addition of a "allowMultipleRuns" parameter and a "dispatcherConnectTimeout" parameter, I have improved TransferWrapper so that it can wait up to a user-specified amount of time for the Dispatcher to enter the "Running" state (which it determines by querying the Dispatcher's status), polling for the Dispatcher's status every "dispatcherConnectRetryInterval_us" (defaults to 1 second, between 1000 us and 30 seconds). If allowMultipleRuns is false, the online monitor will terminate when it receives an EndOfData Fragment (current behavior), otherwise, it will re-enter the Dispatcher connection routine.

#3 Updated by John Freeman 11 months ago

  • Status changed from Resolved to Reviewed

Things look to be working as they should. I'll offer a couple of comments about the output, but first, to describe exactly what I did to test:

  • While the DAQ was in the config state prior to run 3403 (mu2edaqXX:/home/jcfree/run_records/3403) I successfully registered an online monitor using TransferInputShmem.fcl with only the dispatcher port edited
  • In the middle of run 3403, I successfully deregistered the monitor
  • After hitting stop and returning the DAQ to the ready state, I added allowMultipleRuns: true to TransferInputShmem.fcl, and successfully registered the monitor before starting run 3404
  • I issue a stop to run 3404, then see a few ominous messages (more on these below) but then ToyDump results start appearing again as you'd hope after I start run 3405
  • I stop the run so we're back in "ready", and successfully deregister the monitor
  • Later, during run 3407, I run the monitor with allowMultipleRuns: false, and sure enough it exits out when I stop the run

Now, just a couple of comments about the output. After the stop of run 3404, I saw these warnings appear a few times, even though as I mentioned functionally things seemed fine:

%MSG-i TransferWrapper:  DAQ 07-Jan-2020 15:58:02 CST Booted
Requesting that this monitor (shmem1) be unregistered from the dispatcher aggregator
%MSG-i TransferWrapper:  DAQ 07-Jan-2020 15:58:02 CST Booted
Response from dispatcher is "Warning in DispatcherCore::unregister_monitor: unable to find requested transfer plugin with label "shmem1"" 
%MSG-w TransferWrapper:  DAQ 07-Jan-2020 15:58:02 CST Booted
The Dispatcher returned status Warning in DispatcherCore::unregister_monitor: unable to find requested transfer plugin with label "shmem1" when attempting to unregister this monitor!

Also, since the default dispatcherConnectTimeout value is zero, and zero is interpreted as "infinity timeout", the message you see when the DAQ's in the ready state and you launch a monitor is a little confusing:

%MSG-i TransferWrapper:  DAQ 07-Jan-2020 15:59:16 CST Booted
Waited 4.01 s / 0.00 s for Dispatcher to enter the Running state

...while it's obviously less confusing when the timeout's nonzero, i.e., not infinity.

#4 Updated by Eric Flumerfelt 10 months ago

  • Target version set to artdaq v3_07_02
  • Status changed from Reviewed to Closed

Also available in: Atom PDF