Project

General

Profile

Feature #24420

DAQInterface should call the TRACE stop script even if it doesn't receive the stop transition

Added by John Freeman 5 months ago. Updated 4 months ago.

Status:
Reviewed
Priority:
Normal
Assignee:
Category:
-
Target version:
-
Start date:
05/14/2020
Due date:
% Done:

100%

Estimated time:
Experiment:
-
Co-Assignees:
Duration:

Description

Currently, DAQInterface will only call the $DAQINTERFACE_TRACE_SCRIPT with the "stop" argument if DAQInterface is sent the stop transition. However, there are scenarios where something pathological happens during the run s.t. DAQInterface doesn't receive the stop transition; when this happens it would be useful if DAQInterface could attempt to call $DAQINTERFACE_TRACE_SCRIPT when it's cleaning up.

Associated revisions

Revision 7b5eaef8 (diff)
Added by John Freeman 5 months ago

JCF: Issue #24420: if we enter a recovery either when in the running state or the stopping state, source the trace script since it's only sourced otherwise at the end of a normal stop transition

Revision 695375e8 (diff)
Added by John Freeman 4 months ago

JCF: Issue #24420: make sure that even if there are no remaining artdaq processes on a node that the $DAQINTERFACE_TRACE_SCRIPT still visit that node

History

#1 Updated by John Freeman 5 months ago

  • % Done changed from 0 to 100
  • Status changed from New to Resolved

With commit 7b5eaef8b02249b6952071ec5dd6a393a970c92f on branch feature/24420_trace_script_unclean_stop, what happens now is that if the trace script is available one of the first things that happens if we enter a recovery when we're in the running or stopping state is that the script gets executed, since in this scenario it won't execute at its normal point (i.e., at the end of a successful stop transition).

#2 Updated by Gennadiy Lukhanin 4 months ago

  • Status changed from Resolved to Rejected

Testing was performed on the Icarus cluster. I started a typical run 1673 with 6 TPCs running on 3 hosts. After the run was started correctly, I waited for a minute and killed a boardreader running on icarus-tpc07. The daqinterface went onto the "recover" transition, and the run was stopped. No trace file was generated on icarus-tpc07. However, trace files were generated on icarus-evb01, icarus-tpc08, and icarus-tpc10.

-rw-rw-r-- 1 icarus E-1052      177 May 18 17:09 r1671.start
-rw-rw-r-- 1 icarus E-1052 12713023 May 18 18:48 r1671.icarus-evb01.trc
-rw-rw-r-- 1 icarus E-1052      177 May 18 18:48 r1671.stop
-rw-rw-r-- 1 icarus E-1052      177 May 18 18:54 r1672.start
-rw-rw-r-- 1 icarus E-1052 12653573 May 20 14:18 r1672.icarus-evb01.trc
-rw-rw-r-- 1 icarus E-1052      177 May 20 14:18 r1672.stop
-rw-r--r-- 1 icarus E-1052      123 May 20 16:26 r1673.start
-rw-r--r-- 1 icarus E-1052      105 May 20 16:29 r1673.stop
-rw-rw-r-- 1 icarus E-1052  5954859 May 20 16:29 r1673.icarus-evb01.trc
16:33:40icarus@icarus-evb01:/scratch_local/traces

-rw-rw-r-- 1 icarus E-1052 14492260 May 18 18:48 r1671.icarus-tpc07.trc
-rw-rw-r-- 1 icarus E-1052 14491997 May 20 14:18 r1672.icarus-tpc07.trc
16:33:40icarus@icarus-tpc07:/scratch_local/traces
-rw-rw-r-- 1 icarus E-1052 14433017 May 18 18:48 r1671.icarus-tpc10.trc
-rw-rw-r-- 1 icarus E-1052 14400143 May 20 14:18 r1672.icarus-tpc10.trc
-rw-rw-r-- 1 icarus E-1052   852348 May 20 16:29 r1673.icarus-tpc10.trc
16:33:40icarus@icarus-tpc10:/scratch_local/traces
-rw-rw-r-- 1 icarus E-1052 14487644 May 18 18:48 r1671.icarus-tpc08.trc
-rw-rw-r-- 1 icarus E-1052 14487705 May 20 14:18 r1672.icarus-tpc08.trc
-rw-rw-r-- 1 icarus E-1052  6766416 May 20 16:29 r1673.icarus-tpc08.trc
16:33:40icarus@icarus-tpc08:/scratch_local/traces

#3 Updated by Gennadiy Lukhanin 4 months ago

  • Status changed from Rejected to Resolved

The last commit 695375e8 resolved this issue. I've confirmed that traces are captured on all hosts. Testing was done on the Icarus cluster.

#4 Updated by Gennadiy Lukhanin 4 months ago

  • Status changed from Resolved to Reviewed


Also available in: Atom PDF