Feature #11854
Reduce GWMS glidein pressure sensitivity to pslot fragmentation - new glidens are not requests when there are ususable fragments idle
0%
Description
GWMS is very sensitive to fragmentation (jobs vs pslots).
If I submit jobs asking for 3 cores and the pslots have 4 cores there is 1 core remaining idle and causing idle glideins/cores and no new requests.
The full problem is multidimensional and complex: any resource (memory, disk, cpus) could cause fragmentation, clustering resources and jobs in buckets would allow a better management but could explode complexity and computing time.
Using htcondor auto-clustering may be also a direction for a solution (can machines be auto-clustered as well?)
Comparing idle cores instead of idle slots will generate a more correct pressure, the data is already available and could be a first step in the right direction.
Very little jobs (<10%) currently are multicore so the problem is not urgent.
Follows an example:
OK, 151-49=102, respecting 5 idle limit:
[2016-03-01 11:46:42,998] INFO: glideinFrontendElement:1746: 151( 151 102 0 151) 49( 49 10000) | 98 49 49 0 | 196 49 147 | 5 114k Up ITB_FC_CE3x4@gfactory_inst
Could ask for more glideins but 43<70 even if those 70 are unusable 70 cores and there are 129 cores requested by jobs:
[2016-03-01 11:54:43,973] INFO: glideinFrontendElement:1746: 43( 43 0 0 43) 70( 70 10000) | 140 70 70 0 | 280 70 210 | 0 70001 Up ITB_FC_CE3x4@gfactory_inst
History
#1 Updated by Marco Mambelli almost 5 years ago
- Subject changed from Reduce GWMS glidein pressure sensitivity to pslot fragmentation to Reduce GWMS glidein pressure sensitivity to pslot fragmentation - new glidens are not requests when there are ususable fragments idle