Project

General

Profile

Feature #19791

Implement a retry policy in Condor to allow jobs that go held for excess memory usage to have their memory increased and released

Added by Shreyas Bhat over 1 year ago. Updated 4 months ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
JobSub Server RPM
Target version:
Start date:
04/24/2018
Due date:
% Done:

0%

Estimated time:
Stakeholders:
Duration:

Description

Implement a retry policy in Condor to allow jobs that go held for excess memory usage to have their memory increased and released. Details copy/pasted from SNOW RITM0638375:

After discussing with HTCondor team here is what you want in the condor JDL. I haven't tested it so someone (Dennis or Marc or Shreyas) needs to test it.

# Supporting attributes that need to be added to JDL are shown by + sign
# OriginalMemory = <Value of --memory>
# GraceMemory = Increment request_memory by this amount if job was put on

+OriginalMemory = 2000
+GraceMemory = 2000
+MaxAllowedMemory = $(OriginalMemory) + $(GraceMemory)

request_memory = ifthenelse(isUndefined(MemoryUsage), OriginalMemory, MaxAllowedMemory)

periodic_release = (HoldReasonCode =?= 34) && (RequestMemory < MaxAllowedMemory)

History

#1 Updated by Shreyas Bhat over 1 year ago

  • Target version set to v1.2.8

#2 Updated by Dennis Box 12 months ago

  • Target version changed from v1.2.8 to v1.3

#3 Updated by Dennis Box 4 months ago

  • Target version changed from v1.3 to v1.3.3


Also available in: Atom PDF