Project

General

Profile

Feature #5357

Complete the implementation of graceful handling of backpressure

Added by Kurt Biery almost 7 years ago. Updated over 6 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
-
Target version:
Start date:
02/06/2014
Due date:
04/30/2014
% Done:

0%

Estimated time:
24.00 h
Experiment:
-
Co-Assignees:
Duration: 84

Description

At the moment, when there is backpressure in the system that lasts longer than 5-10 seconds, events are dropped by the EventBuilders and Aggregators. We should fix this so that events are not dropped unless a run ends and there truly is no way to recover.

Some work has already been done to prepare for this. There is now a method in EventStore that tries to handle a new event fragment, but returns it if it can't process it within a specified timeout.

We should continue to print out warning messages when backpressure is significant.

After these changes are done, we should test that the system performs as expected both when there is transient and permanent backpressure (in a given run).

History

#1 Updated by Kurt Biery almost 7 years ago

  • Target version set to v1_05_08

#2 Updated by Kurt Biery almost 7 years ago

  • Target version changed from v1_05_08 to v1_05_09

#3 Updated by Kurt Biery almost 7 years ago

  • Target version changed from v1_05_09 to v1_05_10

#4 Updated by Kurt Biery almost 7 years ago

  • Due date set to 04/30/2014
  • Status changed from New to Assigned
  • Assignee set to Kurt Biery
  • Target version changed from v1_05_10 to v1_06_00

#5 Updated by Kurt Biery almost 7 years ago

A related Issue is #3888.

#6 Updated by Kurt Biery over 6 years ago

  • Target version changed from v1_06_00 to v1_07_00

#7 Updated by Kurt Biery over 6 years ago

  • Status changed from Assigned to Resolved

A week or so ago, I made the needed changes in EventStore, EventBuilderCore, and AggregatorCore.

#8 Updated by Kurt Biery over 6 years ago

  • Status changed from Resolved to Closed

I've verified that there are no longer lost events when we experience back-pressure (tests run at the DS-50 WH14NE teststand).

It should be noted that we'll all need to re-orient our searches for back-pressure when searching through the logs. We used to be able to search for "FAIL". Now, we'll need to search for a substring in

Wed Apr 23 11:27:14 -0500 2014: %MSG-w EventBuilderCore: Aggregator-dsfr6-6650 MF-online
Wed Apr 23 11:27:14 -0500 2014: Unable to process event 10652 because of back-pressure - retrying...
Wed Apr 23 11:27:14 -0500 2014: %MSG
Wed Apr 23 11:27:16 -0500 2014: %MSG-w EventBuilderCore: EventBuilder-dseb8-6642 MF-online
Wed Apr 23 11:27:16 -0500 2014: Unable to process fragment 2 in event 10762 because of back-pressure - retrying...
Wed Apr 23 11:27:16 -0500 2014: %MSG

Also available in: Atom PDF