Project

General

Profile

Feature #25201

Make sure that Factory is not removing Glideins if not asked to do so

Added by Marco Mambelli 5 months ago. Updated about 2 months ago.

Status:
New
Priority:
High
Assignee:
Category:
-
Target version:
Start date:
11/13/2020
Due date:
% Done:

0%

Estimated time:
Stakeholders:

HEPCloud

Duration:

Description

This could be a bug or Feature (change of behavior request).

The Glidein removal is controlled by 2 knobs on the Frontend: idle_glideins_lifetime, glideins_removal (see below)

If idle_glideins_lifetime=0 and glideins_removal=DISABLE, the Factory should not kill Glideins that it submitted to condor.

This may be already the behavior:
  1. use a test site where the glidein will remain queued for a very long time (e.g. block the queue)
  2. check what happens if those 2 values are set (trigger a glidein, remove jobs from the queue)
  3. check what happens if the Frontend is killed (DE is not sending classads if there are no jobs). The glideclient classad should disappear but the glidein should remain in queue

1 and 2 should be there.
Is 3 really desired? What happens if the frontend and user collector were set offline? there would be a waste of glideins. If 3 stands there should be a warning about this, i.e. against setting glideins_removal=DISABLE and the consequences.

After deciding about 1, 2, 3, the code shouold be written if the current behavior is different

The next element, idle_glideins_lifetime, determins how many seconds glideins will stay in the idle state in the Factory queue before they are automatically removed (through a periodic_remove expression)

The glideins_removal element is used to control the removal of glideins when they are not used. Glideins can expire and die when not used; the Frontend also has an automatic glidein removal mechanism, anyway you can trigger an early removal or stop all removals. Early removal is controlled by 4 attributes (type, requests_tracking, margin, wait) and is disabled bt default (type="NO"). type selects what to remove: NO (default) no early removal, IDLE only Idle glideins -not submitted-, WAIT also glideins waiting in the remote queues, ALL also running glideins (all glideins except the ones already staging output can be removed), DISABLE disables also the automatic removal mechanism of the Frontend (Glidein expiration is still in place). When requests_tracking is True and the current need for Glideins drops margin below the available Glideins, then the Glideins in excess are removed. If requests_tracking is False (default) then the Frontend removes glideins when there are no pending job requests for this group Either ways type still controls what to remove (and the default NO means no early removal). wait adds a delay, waits for N cycles without requests (or below the margin) before triggering the removal.

History

#1 Updated by Marco Mambelli 4 months ago

  • Target version changed from v3_6_6 to v3_6_7

#2 Updated by Marco Mambelli about 2 months ago

  • Target version changed from v3_6_7 to v3_7_4

Also available in: Atom PDF