Project

General

Profile

Bug #11580

Frontend group limits based on running cores rather than running glideins (condorg) jobs

Added by Parag Mhashilkar over 4 years ago. Updated over 4 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Parag Mhashilkar
Category:
-
Target version:
Start date:
02/02/2016
Due date:
% Done:

0%

Estimated time:
First Occurred:
Occurs In:
Stakeholders:
Duration:

Description

From the logs below, number of running glideins is much smaller than the limit of 1000. Either we are looking at wrong info while requesting glideins or the log line is confusing. Based on the Steve Timm's observation, more glideins went through when limit of 1000 was increased to a higher number.

# Frontend group logs
[2016-02-01 16:09:41,940] INFO: Total matching idle 642 (old 642) running 1358 limit 1000
[2016-02-01 20:29:20,214] INFO:             Jobs in schedd queues                 |         Glideins        |       Cores       |    Request   
[2016-02-01 20:29:20,214] INFO: Idle (match  eff   old  uniq )  Run ( here  max ) | Total  Idle   Run  Fail | Total  Idle   Run | Idle MaxRun Down Factory
[2016-02-01 20:29:20,214] INFO:   589(  589   516   589   589)  1372(   73  1000) |   687    73   687     0 |   614    74   540 |     1    86 Up   FNAL_HEPCloud_1@gfactory_instance@gfactory_service@cmsgwms-factory.fnal.gov

# From frontend config, <group ...><config> section
   <running_glideins_per_entry max="1000" relative_to_queue="1.15"/>

History

#1 Updated by Marco Mambelli over 4 years ago

The counters have been reviewed and behave correctly now.
In the tests the max was respected.

Anyway there could be cases where the # of running jobs bumps above the maximum:
1. GLIDEIN_CPUS is not known (auto, slot) and the glidein end up on worker nodes sttically partitioned in a bunch o 1 core slots.
2. pslot glidein are requested by 8cpu jobs (or 4 cpus) on an entry. When those jobs end the same slots are used by 1 cpu jobs. All the pslots are split in a bunch of dynamic slots and there is a bump in running jobs.

#2 Updated by Parag Mhashilkar over 4 years ago

  • Target version changed from v3_2_14 to v3_2_13

#3 Updated by Parag Mhashilkar over 4 years ago

  • Status changed from New to Resolved

Issues #11145 #11521 #11580 #11645 are addressed in the branch v3/pslot-accounting-review
Changes have been merged to branch_v3_2

#4 Updated by Parag Mhashilkar over 4 years ago

  • Status changed from Resolved to Closed


Also available in: Atom PDF