Project

General

Profile

Bug #3555

DAEMON_SHUTDOWN doesn't work for multi-slot glideins

Added by Burt Holzman over 7 years ago. Updated almost 7 years ago.

Status:
Closed
Priority:
High
Category:
-
Target version:
Start date:
03/01/2013
Due date:
05/10/2013
% Done:

0%

Estimated time:
First Occurred:
Occurs In:
Stakeholders:
Duration: 71

Description

There is a use case (but not a strong one) for multislot glideins that are not based on partitionable slots.

DAEMON_SHUTDOWN as constructed doesn't do what we want; we need something like the below code -- best if we auto-generate this with a for
loop in the startup shell script, since the number of slots can vary.

STARTD_SLOT_EXPRS = $(STARTD_SLOT_EXPRS)  TotalTimeUnclaimedIdle TotalTimeClaimedBusy A\
ctivity

DS1_TO_DIE = ((GLIDEIN_ToDie =!= UNDEFINED) && (CurrentTime > GLIDEIN_ToDie))
DS1_IDLE_MAX = ((Slot1_TotalTimeUnclaimedIdle =!= UNDEFINED) && \
        (GLIDEIN_Max_Idle =!= UNDEFINED) && \
        (Slot1_TotalTimeUnclaimedIdle > GLIDEIN_Max_Idle))
DS1_IDLE_RETIRE = ((GLIDEIN_ToRetire =!= UNDEFINED) && \
       (CurrentTime > GLIDEIN_ToRetire ))
DS1_IDLE_TAIL = ((Slot1_TotalTimeUnclaimedIdle=!= UNDEFINED) && \
        (Slot1_TotalTimeClaimedBusy=!= UNDEFINED) && \
        (GLIDEIN_Max_Tail=!= UNDEFINED) && \
        (Slot1_TotalTimeUnclaimedIdle > GLIDEIN_Max_Tail))
DS1_IDLE = ( (Activity == "Idle") && \
        ($(DS1_IDLE_MAX) || $(DS1_IDLE_RETIRE) || $(DS1_IDLE_TAIL)) )

DS1 = ($(DS1_TO_DIE) || \
         ($(DS1_IDLE) && ((PartitionableSlot =!= True) || (TotalSlots =?=1))))

DS2_TO_DIE = ((GLIDEIN_ToDie =!= UNDEFINED) && (CurrentTime > GLIDEIN_ToDie))
DS2_IDLE_MAX = ((Slot2_TotalTimeUnclaimedIdle =!= UNDEFINED) && \
        (GLIDEIN_Max_Idle =!= UNDEFINED) && \
        (Slot2_TotalTimeUnclaimedIdle > GLIDEIN_Max_Idle))
DS2_IDLE_RETIRE = ((GLIDEIN_ToRetire =!= UNDEFINED) && \
       (CurrentTime > GLIDEIN_ToRetire ))
DS2_IDLE_TAIL = ((Slot2_TotalTimeUnclaimedIdle=!= UNDEFINED) && \
        (Slot2_TotalTimeClaimedBusy=!= UNDEFINED) && \
        (GLIDEIN_Max_Tail=!= UNDEFINED) && \
        (Slot2_TotalTimeUnclaimedIdle > GLIDEIN_Max_Tail))
DS2_IDLE = ( (Activity == "Idle") && \
        ($(DS2_IDLE_MAX) || $(DS2_IDLE_RETIRE) || $(DS2_IDLE_TAIL)) )

DS2 = ($(DS2_TO_DIE) || \
         ($(DS2_IDLE) && ((PartitionableSlot =!= True) || (TotalSlots =?=1))))

STARTD.DAEMON_SHUTDOWN = ($(DS1) && $(DS2))

History

#1 Updated by Burt Holzman over 7 years ago

Mats, I temporarily assigned this to you since you must have dealt with this behavior with multislot glideins.

#2 Updated by Burt Holzman over 7 years ago

  • Assignee changed from Mats Rynge to Anthony Tiradani
  • Priority changed from Normal to High

Actually reassigning to Tony -- this affects work in the cloud too!

#3 Updated by Burt Holzman over 7 years ago

  • Assignee changed from Anthony Tiradani to Mats Rynge

#4 Updated by Mats Rynge over 7 years ago

I have pushed up a branch_master_3555 branch with the following changes:

- Removed the whole node lock - this is no longer applicable as HTPC does not mean whole nodes anymore
- Switched slots_layout to take {'fixed', 'partitionable'}
- Added glidein_cpus_setup.sh, which will probe the system and determine the number of cpus if GLIDEIN_CPUS=0 or GLIDEIN_CPUS=auto
- Implemented the expression mentioned above with a few smaller fixes
- The expression is built dynamically in condor_startup.sh, based on how many cpus are specified

#5 Updated by Burt Holzman over 7 years ago

  • Status changed from Assigned to Feedback
  • Assignee changed from Mats Rynge to Anthony Tiradani

Assigning to Tony to review

#6 Updated by Anthony Tiradani about 7 years ago

  • Assignee changed from Anthony Tiradani to Parag Mhashilkar

I believe that this looks good. Once merged and tested we will want to document the various slot permutations.

#7 Updated by Burt Holzman about 7 years ago

  • Due date set to 05/10/2013

Tony will test this next week (after Condor week).

#8 Updated by Parag Mhashilkar about 7 years ago

  • Assignee changed from Parag Mhashilkar to Anthony Tiradani

Assigning back to Tony since he will be testing this.

#9 Updated by Parag Mhashilkar almost 7 years ago

  • Status changed from Feedback to Closed

This is merged back in the master along with the docs. Closing the ticket.



Also available in: Atom PDF