Project

General

Profile

Feature #6309

Improve frontend config usability by extracting policy expressions into their on files

Added by Parag Mhashilkar over 5 years ago. Updated over 3 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Parag Mhashilkar
Category:
-
Target version:
Start date:
05/20/2014
Due date:
% Done:

0%

Estimated time:
Stakeholders:

CMS, OSG, HEPCloud

Duration:

Description

Summary from email thread on glideinwms mailing list that started Jan 15, 2014

Igor's original proposal

Hi all.

Here is an idea on how to improve the readability of the FE config files.
Please let me know if you like it or not.

So, the problem is that it is currently very hard to read any nontrivial expression in the FE config.
A major reason being the flat nature of the XML string.

For example, see this line used by CMS:
  <match comment="Limit by time and size constraints, but only if not running (for monitoring purposes)" match_expr='((job["JobStatus"]==2) or (((not glidein["attrs"].has_key("GLIDEIN_Max_Walltime")) or (((not
job.has_key("LastVacateTime")) and ((not job.has_key("NormMaxWallTimeMins")) or ((job["NormMaxWallTimeMins"]+10)&lt;((glidein["attrs"]["GLIDEIN_Max_Walltime"]-glidein["attrs"]["GLIDEIN_Retire_Time_Spread"])/60))
)) or ((job.has_key("LastVacateTime")) and ((not job.has_key("MaxWallTimeMins")) or ((job["MaxWallTimeMins"]+10)&lt;((glidein["attrs"]["GLIDEIN_Max_Walltime"]-glidein["attrs"]["GLIDEIN_Retire_Time_Spread"])/60))
)))) and ((not job.has_key("ImageSize")) or (job["ImageSize"]&lt;=(glidein["attrs"]["GLIDEIN_MaxMemMBs"]*1024))) and (((not job.has_key("NumJobStarts")) or (job["NumJobStarts"]&lt;5)) or (job.has_key("LastVacate
Time") and ((job["ServerTime"]-job["LastVacateTime"])&gt;3600)))))and((not glidein["attrs"].has_key("GLIDEIN_Job_Min_Time"))or(job.has_key("NormMaxWallTimeMins")and((job["NormMaxWallTimeMins"]*60)&gt;glidein["at
trs"]["GLIDEIN_Job_Min_Time"])))' start_expr='ifthenelse(LastVacateTime=?=UNDEFINED,ifthenelse(NormMaxWallTimeMins=!=UNDEFINED,(NormMaxWallTimeMins*60)&lt;(GLIDEIN_ToDie-MyCurrentTime),(8*3600)&lt;(GLIDEIN_ToDie
-MyCurrentTime)),ifthenelse(MaxWallTimeMins=!=UNDEFINED,(MaxWallTimeMins*60)&lt;(GLIDEIN_ToDie-MyCurrentTime),(16*3600)&lt;(GLIDEIN_ToDie-MyCurrentTime)))&amp;&amp;(ImageSize&lt;=(GLIDEIN_MaxMemMBs*1024))&amp;&a
mp;(RequestMemory&lt;=GLIDEIN_MaxMemMBs)&amp;&amp;(JOB_Is_ITB =!= TRUE)&amp;&amp;(DESIRES_HTPC=!=True)&amp;&amp;(CMS_ALLOW_OVERFLOW=!=False)&amp;&amp;(Owner=!="sfiligoi")&amp;&amp;(Owner=!="uscms013")&amp;&amp;(
Owner=!="uscms5110")&amp;&amp;(Owner=!="uscmsPool3228")&amp;&amp;(Owner=!="uscms2182")'>

My proposal is to move the expressions out of the XML attribute, and into a text section.
Since etxt sections can be multi line, it would allow for proper indentation, making it much more readable.

Here is the same config as above, but in a user-friendly multi line form:
  <match comment="Limit by time and size constraints, but only if not running (for monitoring purposes)">
    <match_expr>
(
(job["JobStatus"]==2) or
(
  (
   (not glidein["attrs"].has_key("GLIDEIN_Max_Walltime")) or
   (
    ((not job.has_key("LastVacateTime")) and
     (
      (not job.has_key("NormMaxWallTimeMins")) or
      ((job["NormMaxWallTimeMins"]+10)&lt;((glidein["attrs"]["GLIDEIN_Max_Walltime"]-glidein["attrs"]["GLIDEIN_Retire_Time_Spread"])/60))
     )) or
     ((job.has_key("LastVacateTime")) and
      (
       (not job.has_key("MaxWallTimeMins")) or
    ((job["MaxWallTimeMins"]+10)&lt;((glidein["attrs"]["GLIDEIN_Max_Walltime"]-glidein["attrs"]["GLIDEIN_Retire_Time_Spread"])/60)))
      )
   )
  ) and
  (
   (not job.has_key("ImageSize")) or
   (job["ImageSize"]&lt;=(glidein["attrs"]["GLIDEIN_MaxMemMBs"]*1024))
  ) and
  (
   ((not job.has_key("NumJobStarts")) or
    (job["NumJobStarts"]&lt;5)
   ) or
   (job.has_key("LastVacateTime") and
    ((job["ServerTime"]-job["LastVacateTime"])&gt;3600)
   )
  )
)
)
and
(
(not glidein["attrs"].has_key("GLIDEIN_Job_Min_Time"))or
(job.has_key("NormMaxWallTimeMins") and
 ((job["NormMaxWallTimeMins"]*60)&gt;glidein["attrs"]["GLIDEIN_Job_Min_Time"]))
)
    </match_expr>
    <start_expr>
ifthenelse(LastVacateTime=?=UNDEFINED,
       ifthenelse(NormMaxWallTimeMins=!=UNDEFINED,
              (NormMaxWallTimeMins*60)&lt;(GLIDEIN_ToDie-MyCurrentTime),
              (8*3600)&lt;(GLIDEIN_ToDie-MyCurrentTime)),
       ifthenelse(MaxWallTimeMins=!=UNDEFINED,
              (MaxWallTimeMins*60)&lt;(GLIDEIN_ToDie-MyCurrentTime),
              (16*3600)&lt;(GLIDEIN_ToDie-MyCurrentTime)))&amp;&amp;
(ImageSize&lt;=(GLIDEIN_MaxMemMBs*1024))&amp;&amp;
(RequestMemory&lt;=GLIDEIN_MaxMemMBs)&amp;&amp;
(JOB_Is_ITB =!= TRUE)&amp;&amp;
(DESIRES_HTPC=!=True)&amp;&amp;
(CMS_ALLOW_OVERFLOW=!=False)&amp;&amp;
(Owner=!="sfiligoi")&amp;&amp;(Owner=!="uscms013")&amp;&amp;(Owner=!="uscms5110")&amp;&amp;(Owner=!="uscmsPool3228")&amp;&amp;(Owner=!="uscms2182")
  </match>

I am volunteering to implement it ASAP, if we decide to go for it.

Cheers,
 Igor

Parag's Proposal

If policy file is defined in <match> all we do is perform eval() on the respective values.
We are overloading the semantics of query_expr for job and factory, which I don't think is clean, but a starting point. Happy to hear your alternate proposal. Users can also specify the match_attrs in the policy file so that can get out of the config file. Their role is solely to support the policy and by themselves they don't have much significance. There is no functionality change for scoping as end result is same. This extracts all the relevant policy logic (except glexec) out of the config.

Existing Policy section in frontend.xml

         <match match_expr='glidein["attrs"]["GLIDEIN_Site"] in job["DESIRED_Sites"].split(",")' start_expr="True">
            <factory query_expr="(GLIDEIN_Site=!=UNDEFINED)">
               <match_attrs>
                  <match_attr name="GLIDEIN_Site" type="string"/>
               </match_attrs>
               <collectors>
               </collectors>
            </factory>
            <job query_expr="(DESIRED_Sites=!=UNDEFINED)">
               <match_attrs>
                  <match_attr name="DESIRED_Sites" type="string"/>
               </match_attrs>
               <schedds>
               </schedds>
            </job>
         </match>

Policy section of frontend.xml as per the proposal

         <match policy='/path/to/policy.py' match_expr='policy.match(job, glidein)'>
             <factory query_expr="policy.factory_query_expr">
                 <match_attrs/>
             </factory>
             <job query_expr="policy.job_query_expr">
                 <match_attrs/>
             </job>
         </match>

Policy definition file (policy.py)

#!/usr/bin/env python

def match(job, glidein):
    """ 
    Implements policy to match jobs to entries where glideins
    will be requested
    """ 
    return glidein["attrs"].get("GLIDEIN_Site") in job["DESIRED_Sites"].split(",")

factory_query_expr = "(GLIDEIN_Site=!=UNDEFINED)" 

job_query_expr = "(DESIRED_Sites=!=UNDEFINED)" 

factory_match_attrs = {
        'GLIDEIN_Site': 'string'
}
job_match_attrs = {
        'DESIRED_Sites': 'string'
}
my_autocluster.py (6.07 KB) my_autocluster.py Example python-based autoclustering and matchmaking algorithm. Brian Bockelman, 11/19/2014 09:37 AM

Related issues

Blocks GlideinWMS - Milestone #4991: Factory/frontend configurabilityClosed11/21/2013

History

#1 Updated by Parag Mhashilkar over 5 years ago

Brain dump ...

Later discussions through email almost converged on using a policy file to implement the policies but had few unanswered questions.

Policy File Semantics Proposal 1:

Policy file is specified in the config file and frontend assumes a strict convention about its semantics. i.e policy file should define following to be considered as a valid policy

  • match
  • factory_query_expr
  • job_query_expr
  • factory_match_attrs
  • job_match_attrs

However this does not play well with the changes made as part of #5345, $ and $$ expansions. Parsing the policy file for $ & $$ replacement will be ugly.

Policy File Semantics Proposal 2:

We keep the semantics more free flowing and let the user perform explicit calls.

To allow for $ and $$ expansions to work, we can use following where match_expr, query_expr values in frontend xml do a python call to the actual variable/function from policy file. Both expressions can either be a variable or function call that evaluate to a string value.

frontend.xml

         <match policy='/path/to/policy.py' match_expr='policy.match(job, glidein)'>
             <factory query_expr="policy.factory_query_expr($(VAR1))">
                 <match_attrs/>
             </factory>
             <job query_expr="policy.job_query_expr">
                 <match_attrs/>
             </job>
         </match>

policy.py

#!/usr/bin/env python

def match(job, glidein):
    """ 
    Implements policy to match jobs to entries where glideins
    will be requested
    """ 
    return glidein["attrs"].get("GLIDEIN_Site") in job["DESIRED_Sites"].split(",")

def factory_query_expr(required_val):
    return "(GLIDEIN_Site=!=UNDEFINED)&&(CUSTOM_VARIABLE=!=%s)" % required_val 

job_query_expr = "(DESIRED_Sites=!=UNDEFINED)" 

factory_match_attrs = {
        'GLIDEIN_Site': 'string'
}
job_match_attrs = {
        'DESIRED_Sites': 'string'
}

#2 Updated by Parag Mhashilkar over 5 years ago

  • Target version changed from v3_2_6 to v3_2_7

#3 Updated by Parag Mhashilkar over 5 years ago

Updated latest set of changes to v3/6309-try1 and pushed to origin

#4 Updated by Parag Mhashilkar over 5 years ago

  • Target version changed from v3_2_7 to v3_2_8

#5 Updated by Brian Bockelman about 5 years ago

I would like to propose an alternate direction for this ticket. I'll comment a few times (capturing my comments from prior emails but not captured on the ticket).

First, concerns about the above:
- We go from a declarative language to an imperative language. The entire ecosystem of HTCondor and glideinWMS are based on a declarative language (ClassAds) - this would be the one place where policy is expressed as a general-purpose programming language.
- How do you prevent folks from implementing side-effects? How do we do auto-clustering?
- You still have to write the same expressions / policies twice -- once in ClassAds and once in python. This is fraught with danger. Using the example from below, can you easily spot the very important difference between the two following?
- (job.has_key("DESIRED_Sites") and (glidein["attrs"]["GLIDEIN_CMSSite"] in job["DESIRED_Sites"].split(",")))
- isUndefined(job.DESIRED_Sites) || stringListMember(glidein.GLIDEIN_CMSSite, job.DESIRED_Sites)
I really worry about going back / forth between two different languages. I'm pretty sure the examples you give have quoting / escaping issues.
The python module specified still mixes both python and classad expressions!
- We'd be unable to apply simple static analysis techniques to the match_expr. In my proposal below, we can look at the expression and determine how to auto-cluster and what attributes are needed. I worry about relying on frontend admins to specify types and attribute names.
- The #1 complaint about glideinWMS is being able to debug what it is doing. In my proposal below, we can advertise the match expression used in each group and duplicate the matchmaking logic client-side in the long-desired "glidein --better-analyze".
- Ultimately, we are still performing matchmaking - what does a more complex python function buy us? What can we implement with python that can't be done in classads?
- If we are going to dump expression-based languages, why not also dump matchmaking too?

In particular, I really worry about the fact we can no longer do static analysis - we could never provide a so-called "gwms_q -better-analyze". Going to ClassAds-based matchmaking (a proposal I'll post next) would allow us to eliminate factory_match_attrs and job_match_attrs altogether - a happy configuration simplification.

#6 Updated by Brian Bockelman about 5 years ago

Ok, here's what I suggest we do:

I propose the following:
- We add a reasonable default requirements go into the glidein description. The following would be appropriate for the requirements: memory, disk, CPUs, VO, retirement time, DESIRED_Gatekeepers. (Anything not VO-specific)
- As an improvement, we can consider automatically appending the match's start_expr to the requirements when evaluating matches. This might (finally!) get us one step closer to the usual "match expression matches but the glidein doesn't"
- We toss out the match_expr attribute. We throw out match_attr
- We add a new XML tag, "match_expr", which is a ClassAd (allowing multi-line expressions!) that specifies the matching logic. The Requirements line of the ClassAd is evaluated - but other attributes can be used in order to break up large expressions such as the one that started the ticket.
- Through ClassAd chaining, we can also include a "library" of commonly-used expressions. This way, if we want DESIRED_Sites-based matching, we could simply add Requirements = library.DESIRED_SITES_MATCHING && ...
- It's a big hammer (that I'd strongly discourage frontends from), but we can currently invoke python functions from classads. So, the above logic can actually be implemented in this scheme -- again, at a cost of important static analysis losses.
- The match expression has two contexts - "glidein" and "job", matching the glidein and job respectively. Thus,
(job.has_key("DESIRED_Sites") and (glidein["attrs"]["GLIDEIN_CMSSite"] in job["DESIRED_Sites"].split(",")))
becomes
isUndefined(job.DESIRED_Sites) || stringListMember(glidein.GLIDEIN_CMSSite, job.DESIRED_Sites)
- The matchmaking algorithm becomes:
1) The frontend looks at the match expression and determines all references. Reasonable default attributes are added (GLIDEIN_CPUS, Name)
2) Then, we query the gfactory according to the factory entry constraint. We save all the entries, and extend the references list with the external references from the entry's requirements.
- So, if the entry had a requirement "... && Color =?= "blue"", we'll automatically add the reference "Color" to the attributes we query from the schedds.
3) We query the schedd with the appropriate query constraint, projecting along the desired attributes.
4) Perform auto-clustering on the resulting jobs.
5) Perform matchmaking between the glideins and jobs.
6) With the results of matchmaking, calculate various required numbers for determining the requested glideins from each entry point.

In the future (8.3.3 or 8.3.2), step (4) will be performed by HTCondor itself. Attached to this update is an example implementation of the proposed matchmaking algorithm.

#7 Updated by Parag Mhashilkar about 5 years ago

  • Target version changed from v3_2_8 to v3_2_9

#8 Updated by Parag Mhashilkar almost 5 years ago

  • Stakeholders updated (diff)

#9 Updated by Parag Mhashilkar almost 5 years ago

  • Target version changed from v3_2_9 to v3_2_x

#10 Updated by Parag Mhashilkar almost 5 years ago

  • Target version changed from v3_2_x to v3_2_10

#11 Updated by Parag Mhashilkar over 4 years ago

  • Stakeholders updated (diff)

#12 Updated by Parag Mhashilkar over 4 years ago

  • Target version changed from v3_2_10 to v3_3

#13 Updated by Parag Mhashilkar over 4 years ago

I have been going back and forth between my proposal and Brian's proposal. Brian's suggestion is what we would like to have in the long run but HTCondor does not provide us with all the hooks and classad options when it comes to executing complex policies.

For example: VF would like to execute policy that controls glidein requests to cloud based on available funds, current usage etc. All this requires making calls to individual AWS services.

Also changing the code base to python bindings is significant change and we want to tackle it separately. Since VF needs this on a short time scale, my initial proposal seems to be a better option (https://cdcvs.fnal.gov/redmine/issues/6309#note-1). This gives the Frontend admins to come up with complex policies, the way they want. We can continue working towards a better solution when we have identified enough use cases and that we can handle them by just using HTCondor classads.

#14 Updated by Parag Mhashilkar over 4 years ago

  • Stakeholders updated (diff)

#15 Updated by Parag Mhashilkar over 4 years ago

A policy plugin can be implemented using following guidelines

  • factory_query_expr, job_query_expr, factory_match_attrs, job_match_attrs, match(job, glidein) are optional in the policy file
  • query expressions and match attributes follow scoping convention
    • query expressions from frontend and group's scope are ANDed. If they are also defined in an optional policy file, they get ANDed to those defined in the config
    • final dict of match attributes for job/factory is a collection respective dict. For example: Final job_match_attrs = job_match_attrs_from_frontend + job_match_attrs_from_group + job_match_attrs_from_frontend_policy + job_match_attrs_from_group_policy

# Sample Policy File: policy.py

def job_expr():
     return '(DESIRED_Sites_Policy_Main=!=UNDEFINED)'

# Return value of policy.factory_query_expr, policy.job_query_expr and start_expr should be a string but you can be creative as shown for job_query_expr
# start_expr is not supported yet

factory_query_expr = '(GLIDEIN_Site_Policy_Main=!=UNDEFINED)'
job_query_expr = job_expr()

start_expr = '"policy_main_start_str"=="policy_main_start_str"'

# factory_match_attrs and job_match_attrs are dict of dict and have following format

factory_match_attrs = {
    'GLIDEIN_Site_Policy_Main': {'type': 'string', 'comment': 'From policy'}
}

job_match_attrs = {
    'DESIRED_Sites_Policy_Main': {'type': 'string', 'comment': 'From policy'}
}

# match(job, glidein) accepts job and glidein dict and returns a boolean True or False if a autocluster of jobs should request glidein
# glidein["attrs"] is a dict of glidein/entry's classad attributes
# job is a dict of job's class attribute
# For example for the above factory_match_attrs and job_match_attrs, match(job, glidein) is called with
# glidein = {
#      'attrs': {
#          'GLIDEIN_Site_Policy_Main': '<Value of GLIDEIN_Site_Policy_Main in glidefactory classad>'
#      }
# }
# job = {
#    'DESIRED_Sites_Policy_Main': '<Value of DESIRED_Sites_Policy_Main in jobs classad>'
# }
# If there are more attributes configured in the config file for frontend, group or frontend's policy file, they are also added to the glidein and job's dict

def match(job, glidein):
    return (glidein['attrs'].get('GLIDEIN_Site_Policy_Main') in job["DESIRED_Sites_Policy_Main"].split(","))

#16 Updated by Parag Mhashilkar over 4 years ago

  • Status changed from New to Feedback
  • Assignee changed from Parag Mhashilkar to Marco Mambelli

Please review the branch v3/6309. Meanwhile I will update the docs

#17 Updated by Parag Mhashilkar over 4 years ago

  • Status changed from Feedback to Resolved
  • Assignee changed from Marco Mambelli to Parag Mhashilkar

Merged to master

#18 Updated by Parag Mhashilkar almost 4 years ago

  • Status changed from Resolved to Closed

#19 Updated by Parag Mhashilkar over 3 years ago



Also available in: Atom PDF