Project

General

Profile

Idea #13389

Support #15000: Audit all messages produced by artdaq during normal running

Make sure error messages are as useful as possible for end users

Added by John Freeman about 3 years ago. Updated over 2 years ago.

Status:
Assigned
Priority:
Normal
Assignee:
Category:
Known Issues
Target version:
-
Start date:
08/01/2016
Due date:
% Done:

60%

Estimated time:
8.00 h
Experiment:
-
Duration:

Description

This Issue is motivated by an email Bill Badgett sent last week in which he asked about the consequences of repeatedly seeing the diskwriting aggregator print out "Failed to enqueue SubRun event." messages. It might be helpful to review artdaq's error and warning messages to make sure that end users are maximally informed, and if not, to add any necessary info.


Related issues

Related to artdaq - Idea #18210: Non-pathological low request rate results in error messageNew2017-11-12

Related to artdaq - Feature #19475: Messages from CommandableFragmentGenerator should be associated with the fragment generator in questionClosed2018-03-23

Associated revisions

Revision 8be25dfe (diff)
Added by John Freeman about 3 years ago

JCF: error messages in the process fragments loops of eventbuilders and aggregators have been rewritten; see Redmine Issue #13389 for more

Revision 2faf58f9 (diff)
Added by John Freeman almost 3 years ago

JCF: if an invalid transition is requested, explain to the user that the transition's not allowed from the DAQ's current state; this can be filed under ongoing Redmine Issue #13389

History

#1 Updated by John Freeman about 3 years ago

  • Status changed from New to Assigned
  • Assignee set to John Freeman
  • % Done changed from 0 to 50

With artdaq commit 8be25dfea229ac009d872c54a92dd43ed4a175fa, the process fragments loops in the eventbuilders and aggregators have had their messages updated. The main thrust with this commit is that if an problem occurs which removes the guarantee that the DAQ will continue functioning properly until it's shutdown and reinitialized, users are informed of this. A diff with the previous commit will tell the whole story, but some of the changes include:

  • When we want to break out of the loop but an end-of-subrun fragment either hasn't been received or failed to get enqueued for art, tell users they may need to shut down the DAQ. E.g., instead of the error
    Failed to enqueue SubRun event.
    

    we have
    Attempt to send EndOfSubRun fragment to art timed out after 5 seconds; DAQ may need to be returned to the "Stopped" state before further datataking
    
  • If art's not initialized and the fragment received's not the Init fragment, print out an error:
    Didn't receive an Init event with which to initialize art; DAQ may need to be returned to the "Stopped" state before further datataking
    
  • If we've received all the EndOfData fragments and the EndOfSubRun fragment but there are still outstanding fragments, print a warning:
    EndOfSubRun fragment and all EndOfData fragments received but more data expected
    

#2 Updated by John Freeman almost 3 years ago

  • % Done changed from 50 to 60

With artdaq commit 369fe780496352273cdb9894345bd9c0bc565903, if EventStore can't queue an event, replace:

Enqueueing event X FAILED, queue size = Y

with
Enqueueing event X FAILED, queue size = Y; apparently no events were removed from this process's queue during the Z-second timeout period

#3 Updated by Eric Flumerfelt almost 3 years ago

  • Target version set to 575

#4 Updated by Eric Flumerfelt over 2 years ago

  • Category set to Known Issues
  • Target version deleted (575)
  • Parent task set to #15000

#5 Updated by John Freeman almost 2 years ago

  • Related to Idea #18210: Non-pathological low request rate results in error message added

#6 Updated by John Freeman over 1 year ago

  • Related to Feature #19475: Messages from CommandableFragmentGenerator should be associated with the fragment generator in question added


Also available in: Atom PDF