Project

General

Profile

Feature #19791

Implement a retry policy in Condor to allow jobs that go held for excess memory usage to have their memory increased and released

Added by Shreyas Bhat over 2 years ago. Updated 20 days ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Category:
JobSub Server RPM
Target version:
Start date:
04/24/2018
Due date:
% Done:

0%

Estimated time:
Stakeholders:
Duration:

Description

Implement a retry policy in Condor to allow jobs that go held for excess memory usage to have their memory increased and released. Details copy/pasted from SNOW RITM0638375:

After discussing with HTCondor team here is what you want in the condor JDL. I haven't tested it so someone (Dennis or Marc or Shreyas) needs to test it.

# Supporting attributes that need to be added to JDL are shown by + sign
# OriginalMemory = <Value of --memory>
# GraceMemory = Increment request_memory by this amount if job was put on

+OriginalMemory = 2000
+GraceMemory = 2000
+MaxAllowedMemory = $(OriginalMemory) + $(GraceMemory)

request_memory = ifthenelse(isUndefined(MemoryUsage), OriginalMemory, MaxAllowedMemory)

periodic_release = (HoldReasonCode =?= 34) && (RequestMemory < MaxAllowedMemory)

History

#1 Updated by Shreyas Bhat over 2 years ago

  • Target version set to v1.2.8

#2 Updated by Dennis Box about 2 years ago

  • Target version changed from v1.2.8 to v1.3

#3 Updated by Dennis Box over 1 year ago

  • Target version changed from v1.3 to v1.3.3

#4 Updated by Dennis Box 20 days ago

  • Status changed from New to Closed

duplicate of closed issue



Also available in: Atom PDF