Project

General

Profile

Bug #18194

Hold reason not being propagated to database...

Added by Marc Mengel almost 3 years ago. Updated over 2 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Target version:
Start date:
11/10/2017
Due date:
% Done:

100%

Estimated time:
First Occurred:
Scope:
Internal
Experiment:
-
Stakeholders:
Duration:

Description

So our jobsub_q_scraper agent thinks it's collecting a "hold reason" from the condor_q listing; but it somehow doesn't end up in the reason_held field on the job, and then isn't shown on the
triage page.

Secondarily, when we parse the joblog there is a hold reason

028 (1023308.000.000) 11/09 23:15:12 Job ad information event triggered.
...
HoldReason = "Error from slot1_22@fnpc7017.fnal.gov: Docker job has gone over memory limit of 4096 Mb" 

we should snag that, too.

so

  • identify where naming problem occurs between jobsub_q -> bulk_update_job
  • add HoldReason parsing to joblog parser
  • make sure triage page lists reason_held

Associated revisions

Revision 560c0bb2 (diff)
Added by Marc Mengel almost 3 years ago

hold reason bits for issue #18194

History

#1 Updated by Marc Mengel almost 3 years ago

  • Description updated (diff)

#2 Updated by Marc Mengel almost 3 years ago

  • % Done changed from 0 to 90

Made a few updates to get the info actually through...

#3 Updated by Marc Mengel almost 3 years ago

  • Status changed from New to Resolved
  • % Done changed from 90 to 100

#4 Updated by Anna Mazzacane over 2 years ago

  • Status changed from Resolved to Closed

Also available in: Atom PDF