There are reports of DAQInterface taking 7 minutes to complete the Start transition at protoDUNE
What we hear is that if users end a run in fewer than 7 minutes after starting it, Run Control tries to send the Stop command to DAQInterface, but DAQInterface is not ready to receive that Stop command because it is still working on the earlier Start command.
Of course, we should confirm this behavior and, if confirmed, look into what is taking 7 minutes.
We also need to consider making DAQInterface transitions blocking. That is, it wouldn't return until it has completed a transition.
#1 Updated by John Freeman almost 2 years ago
- Status changed from Assigned to Resolved
- % Done changed from 0 to 100
It's been determined that archiving large numbers of FHiCL documents takes several minutes; the reason this wasn't obvious until a day or two ago was because the schema.fcl used for archiving didn't support RoutingMaster_0.fcl and thus the archive attempt returned almost immediately with an error. E.g., with a new schema.fcl file the following took 7 minutes 56 seconds on np04-srv-010 as np04daq:
cd ~/.jcfree/Documents/1001506_to_archive # Contains the FHiCL documents produced for archiving in run 1001506 conftool.py archiveRunConfiguration np04_WibsReal_Ssps00045 1001506
...where the command returned without an error.
For the time being, I've effectively disabled archiving by pointing DAQInterface to a schema file that's designed to cause a quick failure in an archive attempt, by setting ARTDAQ_DATABASE_CONFDIR to "/nfs/sw/control_files/database/disable_archiving" in the /nfs/sw/artdaq/DAQInterface/source_me* files.
Concerning the blocking request: during this long archive period DAQInterface will return "starting" when its state is queried; it returns "running" only after the archiving is complete. However, Run Control at protoDUNE doesn't currently bother checking the state of DAQInterface, which should probably change.