Project

General

Profile

Bug #18194

Updated by Marc Mengel about 3 years ago


So our jobsub_q_scraper agent *thinks* it's collecting a "hold reason" from the condor_q listing; but it somehow doesn't end up in the reason_held field on the job, and then isn't shown on the
triage page.

Secondarily, when we parse the joblog there is a hold reason
<pre>
028 (1023308.000.000) 11/09 23:15:12 Job ad information event triggered.
...
HoldReason = "Error from slot1_22@fnpc7017.fnal.gov: Docker job has gone over memory limit of 4096 Mb"
</pre>
we should snag that, too.

so

* identify where naming problem occurs between jobsub_q -> bulk_update_job
* add HoldReason parsing to joblog parser
* make sure triage page lists reason_held


Back