Project

General

Profile

Feature #11155

Collect job runtime history in htcondor attributes

Added by Marco Mambelli almost 5 years ago. Updated over 4 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
Start date:
12/15/2015
Due date:
% Done:

0%

Estimated time:
Stakeholders:
Duration:

Description

Many jobs start running on one glidein and then are preempted, evicted or otherwise interrupted and continue on a different glidein, maybe on a different site.
The list of sites where the job ran and possibly also the times at each site should be collected to improve the understanding of the job and the sites and to allow for a more correct accounting.

HTCondor is tracking where the job is running via:
CurrentHosts - is the # of hosts
AllRemoteHosts - are all the nodes used by a MPI job (during the last execution)
RemoteHost not documented is the current host (when running)
LastRemoteHost not documented is the last host where the job ran

SYSTEM_JOB_MACHINE_ATTRS = $(SYSTEM_JOB_MACHINE_ATTRS),RemoteHost
SYSTEM_JOB_MACHINE_ATTRS_HISTORY_LENGTH = 10

HTCondor will save:
MachineAttrRemoteHost0
MachineAttrRemoteHost1
...
MachineAttrRemoteHost9

A question is if this is of interest enough for GlideinWMS or if it should be delegated to accounting (e.g. Gratia)

History

#1 Updated by Parag Mhashilkar over 4 years ago

I think this is a great feature to request from HTCondor guys. In the end, job jobs from one startd to the other, glidein or no glidein does not change this fact.



Also available in: Atom PDF