Project

General

Profile

Feature #10910

Review the choice of fixed vs partitionable slots

Added by Marco Mambelli about 4 years ago. Updated over 3 years ago.

Status:
Closed
Priority:
Normal
Category:
-
Target version:
Start date:
11/11/2015
Due date:
% Done:

0%

Estimated time:
Stakeholders:
Duration:

Description

Currently the choice of fixed vs partitionable slots is done in:
- factory configuration, using slots_layout="partitionable" in the config/submit section of the entry configuration
- in the frontend configuration setting the attribute SLOT_LAYOUT to partitionable (should not be passed as paramenter)
- in the frontend configuration setting the attribute FORCE_PARTITIONABLE to True

GLIDEIN_CPUS is involved as well: the script creation/web_base/smart_partitionable.sh sets the layout back to fixed if GLIDEIN_CPUS is 1 or is unset and FORCE_PARTITIONABLE is not True.

The exact interaction of the parameters should be documented.
Maybe the behavior of creation/web_base/smart_partitionable.sh should be changed to be more partitionable friendly (even if the condor team just affirmed that partitionable slots are not production ready).

This came out working on #10092 because when adding extra resources to the main slot there may be a startd failure due to impossible layout when using fixed slots.

History

#1 Updated by Marco Mambelli almost 4 years ago

  • Assignee set to Marco Mambelli
  • Target version set to v3_2_14

Check code in v3/10092_3
Defining type "mainextra"

#2 Updated by Parag Mhashilkar over 3 years ago

  • Target version changed from v3_2_14 to v3_2_15

#3 Updated by Parag Mhashilkar over 3 years ago

  • Target version changed from v3_2_15 to v3_2_16

#4 Updated by Marco Mambelli over 3 years ago

  • Status changed from New to Feedback
  • Assignee changed from Marco Mambelli to Parag Mhashilkar

#5 Updated by Marco Mambelli over 3 years ago

GLIDEIN_CPUS is the number of cores available for the glidein (that affect the number of cores made available to the jobs by the starts - NUM_CPUS). Things will remain the same if you specify a number. There can be also some keywords: node (all cores in the node - detected), slot (all cores in the WFM slot of the glidein - detected), auto (same as node currently).

These are the proposed changes for GLIDEIN_CPUS defaults (that affect the number of cores - NUM_CPUS):
Current Proposed change
Default 1 slot
auto node slot

Changes are meant to make GWMS work better on multicore system where the glidein is not getting the whole node

Slots of the glidein can be fixed (N slots of 1 core each), partitionable (one slot with all cores that can be allocated dynamically).

These are the proposed changes:
- Default will be partitionable (was fixed). i.e. if nothing is specified one 8 cores node will be kept as 1 partitionable slot with 8 cores instead of 8 1 core slots.
- If you define special resources (e.g. GPUs) and add them to te main slots, the main slot will be partitionable even if you selected fixed. This is done to avoid startd errors if the number of spacial resources does not match the one of cores
- removing the forced conversion to fixed for partitionable slots with 1 core

These changes are to move GWMS more towards partitionable slots that have been used for a while and allow better allocation of cores, memory and other resources.

#6 Updated by Parag Mhashilkar over 3 years ago

merged to branch_v3_2. Should be merged to master when Marco addresses issues with merging changes related to #7186 for branch_v3_2 -> master

#7 Updated by Parag Mhashilkar over 3 years ago

  • Status changed from Feedback to Resolved
  • Assignee changed from Parag Mhashilkar to Marco Mambelli

#8 Updated by Parag Mhashilkar over 3 years ago

  • Status changed from Resolved to Closed


Also available in: Atom PDF