Held Status Confusion: Label job as completed or not?
At the following page: https://pomsgpvm01.fnal.gov/poms/triage_job?job_id=284315&tmin=2016-06-20%2019:34:00
Towards the bottom for job history, POMS is confused on the status of the job, switching from "completed" to "held" a few dozen times. Being the job is quite old and has been held for a good long time, there should be an easy way to clear this up.
#2 Updated by Marc Mengel over 4 years ago
After some discussion at the POMS meeting, we decided we should terminate held jobs, as the recovery mechanism is the right way to handle failures, and SAM projects are not gong to re-deliver the files that caused the resource overrun without a recovery job anyway. This should be only a few lines of code, as we already know how to kill a job, and we just have to call it when we get an update to mark it Held...