Modify DAEMON_SHUTDOWN to use idle timers that are relative to change in state
The current logic checks TotalTimeUnclaimedIdle < GLIDEIN_Max_Idle (and TotalTimeUnclaimedBusy < GLIDEIN_Max_Tail).
This works fine for non-partitionable slots, but for partitionable slots, the parent slot state doesn't change state when subslots are partitioned. (It helps that don't enforce those shutdowns unless all the subslots were returned to the parent, but the behavior still differs).
The HTCondor team provided us with a way to do what we want in 8.1.3:
#1 Updated by Parag Mhashilkar about 6 years ago
- Subject changed from Modify DAEMON_SHUTDOWN to use idle timers that work with partitionable slots to Modify DAEMON_SHUTDOWN to use idle timers that are relative to change in state
We need to fix the idle time calculation to be relative to change in state. This affects both partitionable and non-partitionable slots.
On 3/14/2014 11:33 AM, Igor Sfiligoi wrote: The slot goes from Unclaimed/Idle -> Claimed/Idle after a match. Only after the shadow has started (and possibly the files transferred, not sure about this part), will the slot finally go into Claimed/Busy. Notice that only 5s passed since it went into Claimed, so it is not unreasonable for the schedd to take a bit to be ready to get the job going. I think the problem is due to the fact that Slot1_TotalTimeClaimedBusy is set to 0 the moment it enters Claimed State, thus triggering the "Tail" expression. But cannot be 100% sure, as I don't have that information available. So changing State=="Busy" would fix this one particular case. At the last gwms meeting the entire team agreed that changing the behavior so that we use relative rather than cumulative time was reasonable (in other words: we agree with you and will fix it)
#5 Updated by Igor Sfiligoi almost 6 years ago
The fix should play well with
I.e. the shutdown expression must be relative to the internal Condor time attributes.
#7 Updated by Marco Mambelli almost 6 years ago
I changed the names of the variables in the expression and separated some parts to be more expressive and clear.
I opened 2 tickets asking for documentation about 2 attributes that are not in the manual index:
I'm investigating how expressions are evaluated to see if I should add controls in the expressions.