Project

General

Profile

Support #23764

artdaq boot sequence takes significant time for ICARUS daq

Added by Bruce Howard 11 months ago. Updated 10 months ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
Start date:
12/13/2019
Due date:
% Done:

0%

Estimated time:
Experiment:
ICARUS
Co-Assignees:
Duration:

Description

We've been noticing that our artdaq boot sequence takes a seemingly significant chunk of time with just a few components included, which worries us about when we move to have a full detector booting.


Related issues

Related to artdaq Utilities - Idea #23029: DAQInterface could launch processes in parallel across nodesNew07/31/2019

Associated revisions

Revision 82a6dd19 (diff)
Added by John Freeman 10 months ago

JCF: Issue #23764: make localhost like other nodes in that process launch is parallelized

History

#1 Updated by William Badgett 10 months ago

The implied request here is: Can you boot components in parallel? They appear to be launching in sequence in the current DAQInterface

#2 Updated by William Badgett 10 months ago

NB: I would put this at high priority

#3 Updated by John Freeman 10 months ago

  • Related to Idea #23029: DAQInterface could launch processes in parallel across nodes added

#4 Updated by John Freeman 10 months ago

At an artdaq meeting yesterday, we discussed this request. There are two ways to speed up the boot process:

1) The time it takes DAQInterface to perform a boot transition is very much related to the time taken by a source-ing of the DAQ setup script (i.e., the script referred to by the "DAQ setup script" field in the file passed to DAQInterface on the boot transition). The experiment should investigate whether it's possible to reduce this time without significantly compromising the environment set up by the DAQ setup script.

2) For every node on which an experiment runs processes, DAQInterface needs to source the DAQ setup script before launching the processes. Currently, this is done sequentially: on node 1, it sources the script and launches the node 1 processes, then it moves on to node 2, etc. Back in the summer the possibility of making this parallel rather than sequential was discussed, and investigated (see Issue #23029). Not much speedup was found despite taking a multithreaded approach, but we'll try looking into this again.

#5 Updated by John Freeman 10 months ago

Concerning (2), above: note that if DAQInterface launches processes in parallel, the experiment will need to make sure that collisions won't occur if the DAQ setup script is sourced simultaneously across nodes (access to the same resource, etc.)

#6 Updated by William Badgett 10 months ago

I do understand your concerns, but on the hardware side contention usually becomes an issue on the configure transition. Of course, system resources may be an issue, we just have to try it out.

#7 Updated by John Freeman 10 months ago

I'll need ICARUS to perform a test. Allow me to explain:

With one caveat which I'll describe in a moment, it appears DAQInterface does in fact launch the processes in parallel across nodes based on tests I've performed this afternoon on the mu2edaq cluster. You can confirm this for yourself if you take your DAQ setup script and at the top add a line like:

echo "For Issue #23764, starting source of DAQ setup script on" $( hostname ) "at" $( date )

Then, take DAQInterface through the boot transition, and execute something like the following:
for node in <first node> <second node> <etc> ; do ssh $node "grep \"starting source of DAQ setup script\" /tmp/launch_attempt_${USER}_partition${DAQINTERFACE_PARTITION_NUMBER}" ; done

...where you replace the bracketed "first node", etc. with the names of the nodes on which you're running boardreaders. What you'll see is that most of the setup scripts were sourced within seconds of each other. I say "most" because the process launch on each node is performed by a python Popen call which returns immediately except in the case of boardreaders run on the localhost. If you want to see all DAQ setup scripts sourced within seconds of each other, you can run DAQInterface off a feature branch I just made called feature/23764_analyze_boot_sequence. You may also want to add a sleep command to the DAQ setup script and/or a printout at the bottom of the DAQ setup script announcing that it's the end of the source. Please let me know what you find.



Also available in: Atom PDF