Project

General

Profile

Idea #18210

Non-pathological low request rate results in error message

Added by John Freeman over 1 year ago. Updated over 1 year ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
Start date:
11/12/2017
Due date:
% Done:

0%

Estimated time:
Experiment:
-
Duration:

Description

In artdaq v2_03_03, when run in window mode, a fragment generator will print an error message like the following:

Data-taking has paused for 2499982 us (> 1000000 us) while waiting for missing data request messages. Sending Empty Fragments for missing requests!

...whenever CommandableFragmentGenerator::applyRequests is called, there's at least one request in the queue, and it's been missing_request_window_timeout_us microseconds since the fragment generator last sent a non-empty fragment downstream. In the example above, the default value of missing_request_window_timeout_us, 1000000, is used. Along with this error message, the CommandableFragmentGenerator::sendEmptyFragments function will be called, which will discard all requests except the one with the highest sequence ID, sending an empty fragment downstream for each discarded request.

Based on the logic described above, if we use the default missing_request_window_timeout_us and send requests at a rate of less than 1 Hz, then even if things are otherwise running smoothly and requests are being handled almost instantaneously as they arrive, when a new request comes in after a period of greater than 1000000 us, it will result in this error message being printed and a (no-op) call to sendEmptyFragments.
A couple of questions to consider:

  1. Should we reword the message so that it doesn't imply that there are necessarily missing request messages? It may just be the case that someone's sending requests at a lower frequency than usual.
  1. Do we want this to be an error or merely a warning?

One possible way of redoing this message might be the following (like in the message above, in the example below we have the default of missing_request_window_timeout_us == 1000000 and there's a period of 2.5 seconds between requests):

Warning: it's been 2499982 us since a (nonempty) fragment was sent downstream, greater than the timeout value of missing_request_window_timeout_us = 1000000. Of the N requests in the queue, all but the most recent one out of the N will be discarded and an empty placeholder fragment sent downstream to correspond to each discarded request

Related issues

Related to artdaq - Idea #13389: Make sure error messages are as useful as possible for end usersAssigned2016-08-01

Related to artdaq - Idea #18266: Come up with concept of failure to fulfill requestNew2017-11-14

History

#1 Updated by John Freeman over 1 year ago

  • Related to Idea #13389: Make sure error messages are as useful as possible for end users added

#2 Updated by Kurt Biery over 1 year ago

13-Nov-2017, KAB: the investigation into making the error message more useful sound good to me, but I believe that it is essential to fix the bug that is causing this message to appear in the first place.

Based on what John has described to me, I believe that the comparison that should be used when checking if it has been "too long" since a request has been fulfilled is between "now" and the time of the fragment request from the event builder, not between "now" and the time when the most recent fragment was sent.

#3 Updated by John Freeman over 1 year ago

  • Related to Idea #18266: Come up with concept of failure to fulfill request added

#4 Updated by John Freeman over 1 year ago

In response to Kurt's input, I've filed Idea #18266, to get discussion started on what it means for a fragment generator to fail to fulfill a request.



Also available in: Atom PDF