online monitor connection robustness
Currently - online monitoring retries 3 times with .1 s inbetween...
I would like to see online monitoring to be able to be started first and/or
persist across multiple runs.
#1 Updated by Eric Flumerfelt 7 months ago
- Assignee set to Eric Flumerfelt
- Status changed from New to Work in progress
I've started to make some changes to TransferWrapper on artdaq:feature/22882_TransferWrapper_ConnectionRobustness. I would like to get this code to a point where online monitors can be started at any time with respect to the DAQ and be able to connect and disconnect at will.
#2 Updated by Eric Flumerfelt 6 months ago
- Status changed from Work in progress to Resolved
With the addition of a "allowMultipleRuns" parameter and a "dispatcherConnectTimeout" parameter, I have improved TransferWrapper so that it can wait up to a user-specified amount of time for the Dispatcher to enter the "Running" state (which it determines by querying the Dispatcher's status), polling for the Dispatcher's status every "dispatcherConnectRetryInterval_us" (defaults to 1 second, between 1000 us and 30 seconds). If allowMultipleRuns is false, the online monitor will terminate when it receives an EndOfData Fragment (current behavior), otherwise, it will re-enter the Dispatcher connection routine.
#3 Updated by John Freeman 20 days ago
- Status changed from Resolved to Reviewed
Things look to be working as they should. I'll offer a couple of comments about the output, but first, to describe exactly what I did to test:
- While the DAQ was in the config state prior to run 3403 (mu2edaqXX:/home/jcfree/run_records/3403) I successfully registered an online monitor using TransferInputShmem.fcl with only the dispatcher port edited
- In the middle of run 3403, I successfully deregistered the monitor
- After hitting stop and returning the DAQ to the ready state, I added
allowMultipleRuns: trueto TransferInputShmem.fcl, and successfully registered the monitor before starting run 3404
- I issue a stop to run 3404, then see a few ominous messages (more on these below) but then ToyDump results start appearing again as you'd hope after I start run 3405
- I stop the run so we're back in "ready", and successfully deregister the monitor
- Later, during run 3407, I run the monitor with
allowMultipleRuns: false, and sure enough it exits out when I stop the run
Now, just a couple of comments about the output. After the stop of run 3404, I saw these warnings appear a few times, even though as I mentioned functionally things seemed fine:
%MSG-i TransferWrapper: DAQ 07-Jan-2020 15:58:02 CST Booted TransferWrapper.cc:340 Requesting that this monitor (shmem1) be unregistered from the dispatcher aggregator %MSG %MSG-i TransferWrapper: DAQ 07-Jan-2020 15:58:02 CST Booted TransferWrapper.cc:345 Response from dispatcher is "Warning in DispatcherCore::unregister_monitor: unable to find requested transfer plugin with label "shmem1"" %MSG %MSG-w TransferWrapper: DAQ 07-Jan-2020 15:58:02 CST Booted TransferWrapper.cc:355 The Dispatcher returned status Warning in DispatcherCore::unregister_monitor: unable to find requested transfer plugin with label "shmem1" when attempting to unregister this monitor! %MSG
Also, since the default dispatcherConnectTimeout value is zero, and zero is interpreted as "infinity timeout", the message you see when the DAQ's in the ready state and you launch a monitor is a little confusing:
%MSG-i TransferWrapper: DAQ 07-Jan-2020 15:59:16 CST Booted TransferWrapper.cc:286 Waited 4.01 s / 0.00 s for Dispatcher to enter the Running state
...while it's obviously less confusing when the timeout's nonzero, i.e., not infinity.