Project

General

Profile

Feature #2541

Holding jobs from the glideins - collection

Added by Igor Sfiligoi over 8 years ago. Updated about 8 years ago.

Status:
Closed
Priority:
High
Assignee:
Douglas Strain
Category:
Integration with Condor
Target version:
Start date:
03/11/2012
Due date:
% Done:

0%

Estimated time:
Stakeholders:
Duration:

Description

Igor Sfiligoi on 3/1/12 sent this email: =======================================

Since 7.3.X Condor allows to put in an expression that will hold a misbehaving job directly from the startd.
The knob to turn is
WANT_HOLD
http://research.cs.wisc.edu/condor/manual/v7.6/3_3Configuration.html#17419

Turns out, it is a little more complicated that just setting that variable (even without the #2520 bug ;).

Here is the the shortest script that does the job (not saying it is ideal... just what worked)
add_config_line "WANT_HOLD" "JobWantsHold==True"
add_config_line "PREEMPT" "JobWantsHold==True"
add_config_line "WANT_SUSPEND" "False"
add_config_line "WANT_SUSPEND_VANILLA" "False"
add_config_line "WANT_VACATE" "True"
add_config_line "PREEMPT_GRACE_TIME" '10000000*(($(PREEMPT))=!=True)'
add_condor_vars_line "WANT_HOLD" "C" "-" "+" "Y" "N" "-"
add_condor_vars_line "PREEMPT" "C" "-" "+" "N" "N" "-"
add_condor_vars_line "WANT_SUSPEND" "C" "-" "+" "N" "N" "-"
add_condor_vars_line "WANT_SUSPEND_VANILLA" "C" "-" "+" "N" "N" "-"
add_condor_vars_line "WANT_VACATE" "C" "-" "+" "N" "N" "-"
add_condor_vars_line "PREEMPT_GRACE_TIME" "C" "-" "+" "N" "N" "-"

I had to set:
PREEMPT==WANT_HOLD

We have to change
WANT_SUSPEND, WANT_SUSPEND_VANILLA and WANT_VACATE
from what we currently set them in the glidein config

And we have to redefine the value of
MaxJobRetirementTime
although indirectly through
PREEMPT_GRACE_TIME
(see condor_config.*.include)


The above is of course too complicated for the average user, although I can live with it as long as needed.
Can we make the handling of holding jobs from the glideins simpler?

Possibly along the lines of
GLIDEIN_Start -> GLIDEIN_Hold
GLIDEIN_Entry_Start -> GLIDEIN_Entry_Hold
GLIDECLIENT_Start -> GLIDECLIENT_Hold
GLIDECLIENT_Group_Start -> GLIDECLIENT_Group_Hold

Cheers,
Igor

PS: The hold use case I am really interested in is holding when the job uses too much memory.


Related issues

Blocked by GlideinWMS - Feature #2542: Re-evaluate WANT_SUSPEND, WANT_SUSPEND_VANILLA and WANT_VACATEClosed03/11/2012

Blocked by GlideinWMS - Feature #2543: Definitng WANT_HOLD via GLIDEIN_Hold & co.Closed03/11/2012

Blocked by GlideinWMS - Bug #2544: Our PREEMPT expression does not work with WANT_HOLDClosed03/11/2012

Blocked by GlideinWMS - Bug #2545: Our MaxJobRetirementTime expression does not work with WANT_HOLDClosed03/11/2012

History

#1 Updated by Burt Holzman over 8 years ago

  • Assignee set to Douglas Strain
  • Target version set to v3_1

Found this ticket (and its lonely children) -- Doug may have time to look at this.

#2 Updated by Burt Holzman over 8 years ago

  • Target version changed from v3_1 to v2_7_x

#3 Updated by Burt Holzman over 8 years ago

  • Priority changed from Normal to High

#4 Updated by Douglas Strain over 8 years ago

This (and all the related tickets from 2541-2545) is ready to be tested in branch_v2plus_2543

#5 Updated by Igor Sfiligoi over 8 years ago

Code changes look good.

#6 Updated by Douglas Strain over 8 years ago

Marking this ticket as resolved then. Code tested by me and reviewed by Igor. Documentation also updated.
Anyone else testing note that the above examples should probably use "JobWantsHold=?=True" or else condor will barf on undefined values.

Changes in branch_master_2543 and branch_v2plus_2543.

#7 Updated by Douglas Strain over 8 years ago

  • Status changed from New to Resolved

#8 Updated by Parag Mhashilkar over 8 years ago

  • Target version changed from v2_7_x to v2_6

#9 Updated by Parag Mhashilkar about 8 years ago

  • Status changed from Resolved to Closed


Also available in: Atom PDF