Bug #6897
Partitionable glideins not accounted for correctly
0%
OSG
Description
Below is an issue that was discussed over email. Jobs running in partionable subslots are not accounted for correctly, which is easy to see under the main_multicore group at http://osg-flock.grid.iu.edu/vofrontend/monitor/frontendStatus.html
We were trying to understand the occasional gap between "running
jobs" and "claimed glideins" shown at the frontend. Parag explained
it as thus:[...]
Does this mean we occasionally have large mismatches between what
jobs require and what sites have for us? Could we be providing more
information with pilots before claiming glideins?In general Parag's explanation is correct, and something we had some
problems with early in the project. With few active users, having one of
them exclude or target a subset of sites will lead to unclaimed
resources. With more active users, like now, this kind of balances out.
If somebody excludes a site in their requirements, somebody else will
probably pick up those slots.In this case, I think it is a GlideinWMS accounting bug. The reason I
think so is that if you go back to:http://osg-flock.grid.iu.edu/vofrontend/monitor/frontendStatus.html
For Group=total, you will see the gap
For Group=main, no gap
For Group=main_fnal_nocerts, no gap
For Group=main_multicore, all of it is a gap
So it looks like GlideinWMS does not consider jobs running in the
subslots of the dynamic slots.
Related issues
History
#1 Updated by Parag Mhashilkar about 6 years ago
- Assignee set to Parag Mhashilkar
- Target version set to v3_2_9
#2 Updated by Mats Rynge about 6 years ago
Parag Mhashilkar wrote:
Is this still an issue?
Yes, it is still an issue. Follow the link and instructions and you will see what I mean.
#3 Updated by Parag Mhashilkar about 6 years ago
what version of glideinwms you are running? Queering the factory indicates you may be running some custom version
"glideinWMS UNKNOWN"
#4 Updated by Parag Mhashilkar almost 6 years ago
Hi Mats, v3_2_4 and v3_2_7 resolve related issues. Can you please confirm you are running at least 3_2_7 and still see this issues?
#5 Updated by Parag Mhashilkar almost 6 years ago
- Stakeholders updated (diff)
#6 Updated by Parag Mhashilkar almost 6 years ago
- Target version changed from v3_2_9 to v3_2_x
#7 Updated by Parag Mhashilkar almost 6 years ago
On Mar 24, 2015, at 5:25 PM, Mats Rynge wrote:
Parag,
We are running gwms 3.2.8
First, notice the "gap" between running and claimed:
Which is even more noticeable in the main_multicore group which only contains partionable slot glideins. Note, no "running":
--
Mats Rynge
USC/ISI - Pegasus Team <http://pegasus.isi.edu>
#8 Updated by Parag Mhashilkar almost 6 years ago
- Assignee changed from Parag Mhashilkar to Marco Mambelli
- Target version changed from v3_2_x to v3_2_10
#9 Updated by Parag Mhashilkar over 5 years ago
- Priority changed from Normal to High
#10 Updated by Parag Mhashilkar over 5 years ago
- Status changed from New to Feedback
- Assignee changed from Marco Mambelli to Parag Mhashilkar
Brian provided the patch.
#11 Updated by Parag Mhashilkar over 5 years ago
- Related to Bug #8570: Dynamically allocated slots not accounting running jobs added
#12 Updated by Parag Mhashilkar over 5 years ago
- Status changed from Feedback to Resolved
Patch provided by Brian has been applied to v3/6897 and is currently run by the CMS frontend. Reviewed, looks ok. Merging.
#13 Updated by Parag Mhashilkar over 5 years ago
- Status changed from Resolved to Closed