Project

General

Profile

Feature #14248

Use elasticsearch condor events for better job data

Added by Marc Mengel almost 4 years ago. Updated over 3 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Target version:
Start date:
10/24/2016
Due date:
% Done:

100%

Estimated time:
Scope:
Internal
Experiment:
-
Stakeholders:
Duration:

Description

We should use elasticsearch to fetch job data as well as (and possibly instead of)
using condor_q output. We should then get actual final data for the jobs and not
have any without cpu_time, etc. once they're completed.

Michal G left us a nice webservice/elasticsearch.py interface file that should let
us build a new agent for this.

History

#1 Updated by Tanya Levshina almost 4 years ago

This task is dependent on Feature #14376. For some reason I don't have means to add parent -> child relationship

#2 Updated by Joe Boyd almost 4 years ago

  • Assignee set to Joe Boyd

#3 Updated by Joe Boyd almost 4 years ago

reporter agent script collecting job data from condor
event logs (from elasticsearch or directly from condor).
-- actual termination (so we can turn off 3 strikes code in condor_q watcher)
-- final memory/cpu usage
-- job exit code (vs user executable we have now)

#4 Updated by Anna Mazzacane almost 4 years ago

  • Status changed from New to Work in progress

#5 Updated by Joe Boyd almost 4 years ago

We are currently ingesting these attributes into the fifebatch-jobs table:

Env
Jobstatus
ClusterID
ProcID
MATCH_EXP_JOB_GLIDEIN_Site
NumRestarts
HoldReason
RemoteUserCpu
RemoteWallClockTime
JobsubJobID
ProcID

We are NOT ingesting these from jobs:

RemoteHost
EnteredCurrentStatus
Args

Will have to get those to have full functionality.

#6 Updated by Joe Boyd over 3 years ago

  • % Done changed from 30 to 80

#7 Updated by Joe Boyd over 3 years ago

  • Target version changed from v1_1_0 to v2_0_0

#8 Updated by Joe Boyd over 3 years ago

  • % Done changed from 80 to 100

This is done and merged into develop. Want to check after deployment to production

#9 Updated by Joe Boyd over 3 years ago

  • Status changed from Work in progress to Resolved

#10 Updated by Marc Mengel over 3 years ago

  • Status changed from Resolved to Closed


Also available in: Atom PDF