Non-pathological low request rate results in error message
In artdaq v2_03_03, when run in window mode, a fragment generator will print an error message like the following:
Data-taking has paused for 2499982 us (> 1000000 us) while waiting for missing data request messages. Sending Empty Fragments for missing requests!
CommandableFragmentGenerator::applyRequestsis called, there's at least one request in the queue, and it's been
missing_request_window_timeout_usmicroseconds since the fragment generator last sent a non-empty fragment downstream. In the example above, the default value of
missing_request_window_timeout_us, 1000000, is used. Along with this error message, the
CommandableFragmentGenerator::sendEmptyFragmentsfunction will be called, which will discard all requests except the one with the highest sequence ID, sending an empty fragment downstream for each discarded request.
Based on the logic described above, if we use the default missing_request_window_timeout_us and send requests at a rate of less than 1 Hz, then even if things are otherwise running smoothly and requests are being handled almost instantaneously as they arrive, when a new request comes in after a period of greater than 1000000 us, it will result in this error message being printed and a (no-op) call to
A couple of questions to consider:
- Should we reword the message so that it doesn't imply that there are necessarily missing request messages? It may just be the case that someone's sending requests at a lower frequency than usual.
- Do we want this to be an error or merely a warning?
One possible way of redoing this message might be the following (like in the message above, in the example below we have the default of missing_request_window_timeout_us == 1000000 and there's a period of 2.5 seconds between requests):
Warning: it's been 2499982 us since a (nonempty) fragment was sent downstream, greater than the timeout value of missing_request_window_timeout_us = 1000000. Of the N requests in the queue, all but the most recent one out of the N will be discarded and an empty placeholder fragment sent downstream to correspond to each discarded request
#2 Updated by Kurt Biery almost 2 years ago
13-Nov-2017, KAB: the investigation into making the error message more useful sound good to me, but I believe that it is essential to fix the bug that is causing this message to appear in the first place.
Based on what John has described to me, I believe that the comparison that should be used when checking if it has been "too long" since a request has been fulfilled is between "now" and the time of the fragment request from the event builder, not between "now" and the time when the most recent fragment was sent.