Project

General

Profile

Bug #22073

Job submitted, failed to run successfully but POMS so no data file generated, but POMS marked them located

Added by Yuyi Guo over 1 year ago. Updated over 1 year ago.

Status:
Closed
Priority:
Normal
Assignee:
Target version:
Start date:
03/06/2019
Due date:
% Done:

100%

Estimated time:
First Occurred:
Scope:
Internal
Experiment:
-
Stakeholders:
Duration:

History

#1 Updated by Stephen White over 1 year ago

  • Assignee set to Marc Mengel

#2 Updated by Marc Mengel over 1 year ago

So this is sort of expected at the moment. When we stopped tracking individual jobs ,we lost 'Failed' status... Question is can we make our submisisons agent report Failed rather than Completed if,say, the majority of jobs failed...

So "lens" claims to be able to tell us failed job counts -- it can give entries like:

{
        "id": "17869802@jobsub01.fnal.gov",
        "failed": 0,
        "completed": 52,
        "running": 0,
        "held": 0,
        "done": true
      },
  {
        "id": "18199322@jobsub02.fnal.gov",
        "failed": 129,
        "completed": 3967,
        "running": 33,
        "held": 0,
        "done": false
      },

So if we see "done" == true and failed > 50% of completed (or of our completion threshold?) we could mark it Failed instead of Completed...

Okay, so that is 586d53a, need to test it out a bit...

#3 Updated by Marc Mengel over 1 year ago

  • % Done changed from 0 to 70
  • Status changed from New to Work in progress

#4 Updated by Marc Mengel over 1 year ago

  • % Done changed from 70 to 90
  • Status changed from Work in progress to Resolved

Setup a job in development to have all the processes exit 1, and we marked the submission Failed instead of Completed! Yay!

#5 Updated by Marc Mengel over 1 year ago

  • % Done changed from 90 to 100

#6 Updated by Stephen White over 1 year ago

  • Status changed from Resolved to Closed


Also available in: Atom PDF