Project

General

Profile

Feature #11854

Reduce GWMS glidein pressure sensitivity to pslot fragmentation - new glidens are not requests when there are ususable fragments idle

Added by Marco Mambelli over 4 years ago. Updated over 4 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
Start date:
03/01/2016
Due date:
% Done:

0%

Estimated time:
Stakeholders:
Duration:

Description

GWMS is very sensitive to fragmentation (jobs vs pslots).
If I submit jobs asking for 3 cores and the pslots have 4 cores there is 1 core remaining idle and causing idle glideins/cores and no new requests.

The full problem is multidimensional and complex: any resource (memory, disk, cpus) could cause fragmentation, clustering resources and jobs in buckets would allow a better management but could explode complexity and computing time.

Using htcondor auto-clustering may be also a direction for a solution (can machines be auto-clustered as well?)

Comparing idle cores instead of idle slots will generate a more correct pressure, the data is already available and could be a first step in the right direction.

Very little jobs (<10%) currently are multicore so the problem is not urgent.

Follows an example:
OK, 151-49=102, respecting 5 idle limit:
[2016-03-01 11:46:42,998] INFO: glideinFrontendElement:1746: 151( 151 102 0 151) 49( 49 10000) | 98 49 49 0 | 196 49 147 | 5 114k Up ITB_FC_CE3x4@gfactory_inst

Could ask for more glideins but 43<70 even if those 70 are unusable 70 cores and there are 129 cores requested by jobs:
[2016-03-01 11:54:43,973] INFO: glideinFrontendElement:1746: 43( 43 0 0 43) 70( 70 10000) | 140 70 70 0 | 280 70 210 | 0 70001 Up ITB_FC_CE3x4@gfactory_inst

History

#1 Updated by Marco Mambelli over 4 years ago

  • Subject changed from Reduce GWMS glidein pressure sensitivity to pslot fragmentation to Reduce GWMS glidein pressure sensitivity to pslot fragmentation - new glidens are not requests when there are ususable fragments idle


Also available in: Atom PDF