Project

General

Profile

Feature #2943

ResourceWatch utility

Added by Rob Kutschke almost 7 years ago. Updated over 1 year ago.

Status:
Feedback
Priority:
Normal
Assignee:
-
Category:
Application
Target version:
-
Start date:
Due date:
% Done:

0%

Estimated time:
Scope:
Experiment:
Mu2e
SSI Package:
Duration:

Description

We would like art to supply the tooling to periodically check resource availability. This request is motivated by the following immediate problem: we need to recognize when we are filling up the available disk quota on local disk on a grid worker node. When a critical level is reached, we would like to trigger an orderly shutdown of art on the next available event boundary. We would like this to be run-time configurable: turn on/off, set thresholds for warnings, set thresholds for shutdown.

It would be great if the tooling could automagically figure out if it was in a grid job and have different defaults than for a non-grid job.

At present I envisage the implementation as an art service but I used the work "tooling" not "service" since I can imagine other implementations. ( Test the existence of some griddy environment variables? Test hostnames?).

I can imagine that we also might want to trigger shutdowns based on: a memory use pattern that smells like a memory leak; total job time; remaining time in the batch queue; others?

History

#1 Updated by Marc Paterno almost 7 years ago

  • Status changed from New to Feedback

Consensus is that the grid system should perform the monitoring, and signal art when a problem exists. We'll wait on this issue to see if the related problem can be solved outside of art.

#2 Updated by Christopher Green almost 6 years ago

  • Category set to Application
  • Start date deleted (08/30/2012)
  • Experiment Mu2e added

This issue might benefit from exposure to discussion within FIFE.

#3 Updated by Rob Kutschke almost 6 years ago

At last report ( probably more than 6 months ago) the grid people said that they had no interest in this sort of thing. I will bring it up again.

#4 Updated by Kyle Knoepfel over 4 years ago

  • Target version set to 521

#5 Updated by Kyle Knoepfel over 1 year ago

  • Target version deleted (521)


Also available in: Atom PDF