Project

General

Profile

Bug #19139

Failed jobs classified as located in dbrailsf_gen_poms_test campaign stage.

Added by Anna Mazzacane over 2 years ago. Updated over 2 years ago.

Status:
Closed
Priority:
High
Assignee:
Target version:
Start date:
02/26/2018
Due date:
% Done:

100%

Estimated time:
First Occurred:
Scope:
Internal
Experiment:
-
Stakeholders:
Duration:

Description

An SBND productioner started to run a first campaign (dbrailsf_gen_poms_test) using sbndcode v06_69_00. It turns out that the fcl configuration is bugged for that version (incorrect service is set up). All of 10 jobs failed, as they should. However, those 10 jobs are reported as located. Located means that the output files are declared to SAM. Why failed jobs can be classified as located?

History

#1 Updated by Marc Mengel over 2 years ago

  • Status changed from Assigned to Work in progress
  • % Done changed from 0 to 90

We now have a Failed job and task status, which ends the workflow; at the moment the definition is Failed as defined in Success in the wiki; as
encoded in JobsFiles.py in the "failed_job" subroutine.

Bulk of changes in 7fdd05ad and 1fb1b6cd with some of that code ripped back out in 1a1db283 and 82f92a15...

My main concern at the moment is that the cpu time threshold is too low -- we might use a few seconds of cpu time copying in scripts, etc. and still have a failed job... thus eb3f531.

#2 Updated by Anna Mazzacane over 2 years ago

  • Status changed from Work in progress to Resolved
  • % Done changed from 90 to 100

#3 Updated by Anna Mazzacane over 2 years ago

  • Status changed from Resolved to Closed


Also available in: Atom PDF