We would like art to supply the tooling to periodically check resource availability. This request is motivated by the following immediate problem: we need to recognize when we are filling up the available disk quota on local disk on a grid worker node. When a critical level is reached, we would like to trigger an orderly shutdown of art on the next available event boundary. We would like this to be run-time configurable: turn on/off, set thresholds for warnings, set thresholds for shutdown.
It would be great if the tooling could automagically figure out if it was in a grid job and have different defaults than for a non-grid job.
At present I envisage the implementation as an art service but I used the work "tooling" not "service" since I can imagine other implementations. ( Test the existence of some griddy environment variables? Test hostnames?).
I can imagine that we also might want to trigger shutdowns based on: a memory use pattern that smells like a memory leak; total job time; remaining time in the batch queue; others?