Project

General

Profile

Bug #23403

In the event of ssh problems, DAQInterface should avoid hangs

Added by John Freeman about 1 month ago. Updated 28 days ago.

Status:
Reviewed
Priority:
Normal
Assignee:
Category:
-
Target version:
-
Start date:
10/09/2019
Due date:
% Done:

100%

Estimated time:
Experiment:
-
Co-Assignees:
Duration:

Description

During the boot transition, DAQInterface makes use of ssh for a number of reasons - to determine the names of the artdaq process logfiles, to test the sourcing of the DAQ setup script on a different node, etc. If the ssh requires a password under the user's account, then DAQInterface will appear to hang since it's obviously not able to interactively enter the user's password. An effort should be made to prevent this hang, via a timeout, followed by an informative error message concerning why DAQInterface was unable to proceed with the boot transition.

History

#1 Updated by John Freeman about 1 month ago

  • % Done changed from 0 to 100
  • Status changed from New to Resolved

Resolved with commit 5e42e28fcdfe48abf64be2460edcb2968e30ae6f at the head of bugfix/23403_avoid_ssh_hangs. Now, instead of potentially causing a hang, the ssh calls in the boot transition are given a 30-second timeout. If they return 124, indicating that the timeout got hit, you'll see a message like the following:

Nonzero value (124) returned in attempt to source script
/home/jcfree/artdaq-demo_v3_06_00/setupARTDAQDEMO on host "mu2edaq05";
returned value suggests that the ssh call to mu2edaq05 timed out. Perhaps
a lack of public/private ssh keys resulted in ssh asking for a password?

(if the failure occurred when trying to source the DAQ setup script on a different node)
or
Returned value of 124 suggests that the ssh call to mu2edaq05 timed out.
Perhaps a lack of public/private ssh keys resulted in ssh asking for a
password?

(if the failure occurred when trying to mkdir -p the logfile directories on a different node)

Either way, DAQInterface will deposit you back in the Stopped state rather than hanging should this occur.

#2 Updated by Eric Flumerfelt 28 days ago

  • Status changed from Resolved to Reviewed
  • Co-Assignees Eric Flumerfelt added

Code reviewed and tested with & without change



Also available in: Atom PDF