Project

General

Profile

Feature #7920

Log all curbs & limits hit to the user collector ads

Added by Brian Bockelman almost 5 years ago. Updated over 3 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
-
Target version:
Start date:
01/13/2016
Due date:
% Done:

100%

Estimated time:
(Total: 0.00 h)
Stakeholders:

CMS, OSG

Duration:

Description

The curbs and limits are one of the most opaque behaviors in glideinWMS. Many times (>10) we've tried to debug mysterious lack of glideins only to find the issue was not site-related -- rather, they had to do with an unrelated global limit and a problem at the other site.

We should be able to get the following into the glideresource ads:
- Per entry point, any factory-side curbs or limits applied (max running / idle jobs per frontend, for example)
- Per entry point, any frontend-side curbs or limits applied (max running / idle jobs per group, frontend, or globally, for example).
- Any schedd exclusions applied.

Since some limits are frontend-wide, it may make sense to have a "frontendresource" ad in the user collector. Not necessary though.


Subtasks

Feature #11418: Curbs and Limits for FactoryClosedHyunWoo Kim


Related issues

Has duplicate GlideinWMS - Feature #7826: Need additional debugging information in the glideresource classadClosed02/10/2015

History

#1 Updated by Parag Mhashilkar almost 5 years ago

  • Stakeholders updated (diff)

#2 Updated by Parag Mhashilkar over 4 years ago

  • Assignee set to HyunWoo Kim
  • Target version set to v3_2_12

#3 Updated by HyunWoo Kim over 4 years ago

I learned that Frontend composes glideresource classad based on the contents of glideclient and glidefactoryclient classads.
So, I will just have to add new entries in Frontend's glideclient classad and Factory's glidefactoryclient classad
and these new entries can be used for curbs and limits that the user wants to know about.
I have identified the section of the relevant codes where I can add new classad entries,
and I will find out how to gather relevant information on useful curbs and limits.

#4 Updated by HyunWoo Kim over 4 years ago

  • % Done changed from 0 to 50

In Frontend code:
glideinFrontendInterface.py
class MultiAdvertizeWork:
def createAdvertizeWorkFile should be modified to add new attributes for curbs and limits to glideclient classad

In Factory code:
glideFactoryEntry.py:
class Entry:
def writeClassadsToFile and gfi.MultiAdvertizeGlideinClientMonitoring should be modified to add new attributes for curbs and limits to glidefactoryclient classad

glideFrontendInterface.py
class ResourceClassad: has two methods
def setGlideClientMonitorInfo(self, monitorInfo): for glideclient classad
def setGlideFactoryMonitorInfo(self, info): for glidefactoryclient classad
The second method loops over all the attributes in glidefactoryclient classad
which means our new attributes will be picked up by glideresource
but the first method copies attributes using their names
which means we will have to modify this method to explicitly specify new attributes to be copied.

#5 Updated by HyunWoo Kim over 4 years ago

  • % Done changed from 50 to 60

I confirmed that when I added new test attributes in glidefactoryclient classad, those were picked up automatically by
glideresource classad.
How glideclient classad is translated to glideresource classad is also understood.
Now, I just need to find actual information to be advertized.
Parag just showed me compute_glidein_min_idle method of class glideinFrontendElement
which computes those limits that were hit.
I will have to figure out how to provide that type of information to those codes that produce glideclient and glidefactoryclient classads.

#6 Updated by HyunWoo Kim over 4 years ago

  • % Done changed from 60 to 80

I have implemented current idea of how to take care of this ticket in my local GWMS instance and tested.
I am now creating a new git branch for this ticket to ask other developers to review it.

#7 Updated by HyunWoo Kim over 4 years ago

I am currently finalizing on what kind of information should be provided and how to display.
Basic variables to be displayed are
- number of user jobs (running or idling),
- number of glideins(or startds) that are running user jobs or waiting(idling) for user jobs
- and limits(and curbs) that affect the number of glideins in each category (per entry, per group, per FE, or globally)

The following is a candidate solution that shows Frontend side.
I am still working on Factory side information..

  1. Information on Jobs
    Total Idlling Jobs in the Queue (matched or unmatched) = 0
    Total Idlling Jobs in the Queue matched for this Entry = 0
    Total Running Jobs in the Queue matched for this entry = 2
  2. Information on Entry-centric Glideins or Startds
    Total(Idlling+Running) Glideins/Startds per Entry(and its limit) = 2(10000)
    All Idlling---------------Glideins/Startds per Entry(and its limit) = 0(100)
  3. Information on Frontend-centric Glideins or Startds
    Total(Idlling+Running) Glideins/Startds per Group(and its limit) = 4(100000)
    All Idlling---------------Glideins/Startds per Group(and its limit) = 0(1000)
    Total(Idlling+Running) Glideins/Startds per Frone(and its limit) = 4(100000)
    All Idlling---------------Glideins/Startds per Frone(and its limit) = 0(1000)
    Total(Idlling+Running) Glideins/Startds per Globa(and its limit) = 4(100000)
    All Idlling---------------Glideins/Startds per Globa(and its limit) = 0(1000)

#8 Updated by HyunWoo Kim over 4 years ago

  • % Done changed from 80 to 90

The above codes have been modified to advertize only those limits and curbs that have been exceeded.
Changes have been pushed to the main repository for review.

#9 Updated by HyunWoo Kim over 4 years ago

I did some more testings to see how the new code works

In /etc/gwms-frontend/frontend.xml, I modified the following 3 max

<frontend advertise_delay="
<config>
<running_glideins_total curb="90000" max="2"/> per frontend
<running_glideins_total_global curb="90000" max="3"/> per global

&lt;groups&gt;
&lt;group name="main" enabled="True"&gt;
&lt;config&gt;
&lt;running_glideins_total curb="90000" max="4"/&gt; per group

My internal log shows
Total(Idlling+Running) Glideins/Startds per Group(and its limit) = 4(4)
Total(Idlling+Running) Glideins/Startds per Frone(and its limit) = 4(2)
Total(Idlling+Running) Glideins/Startds per Globa(and its limit) = 4(3)

[root@fermicloud159 frontend]# condor_status -any -l -constraint 'MyType=="glideresource"' | grep Dummy
TotalGlideinsPerGroup _HasBeenTriggered_NotRequestingMoreGlideins = "Dummy"
TotalGlideinsPerFrontend _HasBeenTriggered_NotRequestingMoreGlideins = "Dummy"
TotalGlideinsGlobal _HasBeenTriggered_NotRequestingMoreGlideins = "Dummy"

At this point, there were 4 running glideins and no more glideins were requested because the above
3 limits have been exceeded.

So,we can get some useful information from glideresource classad attributes..

I will talk with Parag tomorrow and proceed to "Feedback" status if this is satiafactory.

I will create a sub-ticket for a similar code for Factory..

#10 Updated by HyunWoo Kim over 4 years ago

  • Tracker changed from Bug to Feature
  • Status changed from New to Feedback
  • Assignee changed from HyunWoo Kim to Parag Mhashilkar

I think it is time to move this ticket to a feedback status

#11 Updated by Parag Mhashilkar over 4 years ago

  • Assignee changed from Parag Mhashilkar to HyunWoo Kim

Adding value "Dummy" to value is not useful and not a good practice. You can easily make this classad attribute more useful by assigning useful value.

Here is an example,

         if (count_status['Total']      >= self.max_running):
            which_limits_triggered['TotalGlideinsPerEntry'] = 1
   

Change this to

         if (count_status['Total'] >= self.max_running):
            limits_triggered['TotalGlideinsPerEntry'] = 'count=%i, limit=%i' % (count_status['Total'], self.max_running)
   

Other comments

  • Shorten classad attribute name
    • Example: foo*_HasBeenTriggered_NotRequestingMoreGlideins -> GlideResource_Limit_foo*
    • Example: Curb*_HasBeenTriggered_GlideinsRequestsReduced -> GlideResource_Curb_*. For values of Curbbed attributes you can have 'count=%i, limit=%i, curbed='
  • Remove extra spaces from the comparisons the in the if statements

#12 Updated by Parag Mhashilkar about 4 years ago

  • Stakeholders updated (diff)

#13 Updated by HyunWoo Kim about 4 years ago

  • Status changed from Feedback to Resolved
  • % Done changed from 90 to 100

I have merged this branch into branch_v3_2
I am closing this ticket.

#14 Updated by Parag Mhashilkar almost 4 years ago

  • Status changed from Resolved to Closed

#15 Updated by Parag Mhashilkar over 3 years ago

  • Related to Feature #7826: Need additional debugging information in the glideresource classad added

#16 Updated by Parag Mhashilkar over 3 years ago

  • Related to deleted (Feature #7826: Need additional debugging information in the glideresource classad)

#17 Updated by Parag Mhashilkar over 3 years ago

  • Has duplicate Feature #7826: Need additional debugging information in the glideresource classad added


Also available in: Atom PDF