Project

General

Profile

Bug #6897

Partitionable glideins not accounted for correctly

Added by Mats Rynge over 5 years ago. Updated over 4 years ago.

Status:
Closed
Priority:
High
Assignee:
Parag Mhashilkar
Category:
Frontend Monitoring
Target version:
Start date:
08/29/2014
Due date:
% Done:

0%

Estimated time:
First Occurred:
Occurs In:
Stakeholders:

OSG

Duration:

Description

Below is an issue that was discussed over email. Jobs running in partionable subslots are not accounted for correctly, which is easy to see under the main_multicore group at http://osg-flock.grid.iu.edu/vofrontend/monitor/frontendStatus.html

We were trying to understand the occasional gap between "running
jobs" and "claimed glideins" shown at the frontend. Parag explained
it as thus:

[...]

Does this mean we occasionally have large mismatches between what
jobs require and what sites have for us? Could we be providing more
information with pilots before claiming glideins?

In general Parag's explanation is correct, and something we had some
problems with early in the project. With few active users, having one of
them exclude or target a subset of sites will lead to unclaimed
resources. With more active users, like now, this kind of balances out.
If somebody excludes a site in their requirements, somebody else will
probably pick up those slots.

In this case, I think it is a GlideinWMS accounting bug. The reason I
think so is that if you go back to:

http://osg-flock.grid.iu.edu/vofrontend/monitor/frontendStatus.html

For Group=total, you will see the gap

For Group=main, no gap

For Group=main_fnal_nocerts, no gap

For Group=main_multicore, all of it is a gap

So it looks like GlideinWMS does not consider jobs running in the
subslots of the dynamic slots.


Related issues

Related to GlideinWMS - Bug #8570: Dynamically allocated slots not accounting running jobsClosed05/04/2015

History

#1 Updated by Parag Mhashilkar about 5 years ago

  • Assignee set to Parag Mhashilkar
  • Target version set to v3_2_9

Is this still an issue? Seems like duplicate of #5239 & #5751

#2 Updated by Mats Rynge about 5 years ago

Parag Mhashilkar wrote:

Is this still an issue?

Yes, it is still an issue. Follow the link and instructions and you will see what I mean.

#3 Updated by Parag Mhashilkar about 5 years ago

what version of glideinwms you are running? Queering the factory indicates you may be running some custom version

"glideinWMS UNKNOWN"

#4 Updated by Parag Mhashilkar almost 5 years ago

Hi Mats, v3_2_4 and v3_2_7 resolve related issues. Can you please confirm you are running at least 3_2_7 and still see this issues?

#5 Updated by Parag Mhashilkar almost 5 years ago

  • Stakeholders updated (diff)

#6 Updated by Parag Mhashilkar almost 5 years ago

  • Target version changed from v3_2_9 to v3_2_x

#8 Updated by Parag Mhashilkar almost 5 years ago

  • Assignee changed from Parag Mhashilkar to Marco Mambelli
  • Target version changed from v3_2_x to v3_2_10

#9 Updated by Parag Mhashilkar over 4 years ago

  • Priority changed from Normal to High

#10 Updated by Parag Mhashilkar over 4 years ago

  • Status changed from New to Feedback
  • Assignee changed from Marco Mambelli to Parag Mhashilkar

Brian provided the patch.

#11 Updated by Parag Mhashilkar over 4 years ago

  • Related to Bug #8570: Dynamically allocated slots not accounting running jobs added

#12 Updated by Parag Mhashilkar over 4 years ago

  • Status changed from Feedback to Resolved

Patch provided by Brian has been applied to v3/6897 and is currently run by the CMS frontend. Reviewed, looks ok. Merging.

#13 Updated by Parag Mhashilkar over 4 years ago

  • Status changed from Resolved to Closed


Also available in: Atom PDF