Implement a procedure to avoid that jobs exceed resource requirements
This issue is to have a procedure that monitors the resource usage of the experiment script.
The resource to monitor are: execution time, memory usage, disk usage.
If the job reach 99% of the required resources, the process need to be killed to avoid the get the jobs held.
The experiment script will return different exit codes depending on the resource that triggered the process to be killed.
- Tracker changed from Task to Feature
- Assignee set to Michele Fattoruso
- Assignee changed from Michele Fattoruso to Vito Di Benedetto
- % Done changed from 0 to 40
The check about the expected_lifetime is complete
- Status changed from New to Work in progress
- Status changed from Work in progress to Resolved
- % Done changed from 40 to 100
The DAG structure used doesn't require all jobs survive.
- Status changed from Resolved to Closed
- Start date deleted (
- Parent task deleted (
Also available in: Atom