Project

General

Profile

Bug #5241

Frontend credential selection plugin ProxyUserMapWRecycling seems to be broken

Added by Parag Mhashilkar over 5 years ago. Updated 15 days ago.

Status:
New
Priority:
Low
Category:
Frontend
Target version:
Start date:
01/22/2014
Due date:
% Done:

0%

Estimated time:
First Occurred:
Stakeholders:

CMS

Duration:

Description

On Jan 22, 2014, at 2:34 PM, Igor Sfiligoi wrote:

BTW: We were using ProxyUserMapWRecycling.

I switched to ProxyAll now, and things seem to work as expected there.
So this is less urgent that I originally though it was.

But the bug is real, and should be addressed.
Let me know i I should create a ticket for it, or if you are doing it.

Thanks,
Igor

On 01/22/2014 12:06 PM, Igor Sfiligoi wrote:
Hi Parag and Burt.

I think we have significant problems with the v3 FE for CMS AnaOps.
Looks like it is setting max_run to a ridiculously low number.

I think the bug is in
git blame glideinFrontendPlugins.py
d5fb79af (Doug Strain 2012-01-11 17:08:44 -0600 394) # Out of the max_run glideins,
d5fb79af (Doug Strain 2012-01-11 17:08:44 -0600 395) # Allocate proportionally out of the total jobs
3da1255c (Parag Mhashilkar 2013-05-30 14:23:22 -0500 396) if (params_obj is not None):
d5fb79af (Doug Strain 2012-01-11 17:08:44 -0600 397) this_max=self.num_user_jobs[user]*params_obj.max_run_glideins/self.total_jobs
d5fb79af (Doug Strain 2012-01-11 17:08:44 -0600 398) this_idle=self.num_user_jobs[user]*params_obj.min_nr_glideins/self.total_jobs
d5fb79af (Doug Strain 2012-01-11 17:08:44 -0600 399) if (this_max<=0):
d5fb79af (Doug Strain 2012-01-11 17:08:44 -0600 400) this_max=1
d5fb79af (Doug Strain 2012-01-11 17:08:44 -0600 401) if (this_idle<=0):
d5fb79af (Doug Strain 2012-01-11 17:08:44 -0600 402) this_idle=1
d5fb79af (Doug Strain 2012-01-11 17:08:44 -0600 403) cel['proxy'].add_usage_details(this_idle,this_max)

I.e. since CMS has a large number of users but only 10 pilot pilots, the above logic does not make sense.

Can you please have a quick look at it and either confirm or tell me I am completely off?
I have never had a look at that code before.

Thanks,
Igor

-------- Original Message --------
Subject: Re: [Cms-wms-support] Upgrading the AnaOps FE to v3
Date: Wed, 22 Jan 2014 11:50:53 -0800
From: Igor Sfiligoi <>
To: Jeff Dost <>

Uhm... looks like there are two values now:
[1148] frontend@glidein-frontend ~$ condor_status -any 'CMS_T2_US_Florida_iogw1@v3_0@SDSC@UCSD-v6_0.main' -l |grep -i request |grep -v Expr
GlideClientMonitorGlideinsRequestIdle = 20
GlideClientMonitorGlideinsRequestMaxRun = 744
GlideFactoryMonitorRequestedIdle = 10
GlideFactoryMonitorRequestedMaxGlideins = 90

And, one makes sense, and the other doesn't!

Digging deeper now.

Igor

Here is the FE log
[2014-01-22 11:42:49,337] INFO: Jobs in schedd queues | Glideins | Request
[2014-01-22 11:42:49,338] INFO: Idle (match eff old uniq ) Run ( here max ) | Total Idle Run | Idle MaxRun Down Factory
[2014-01-22 11:42:49,418] INFO: 634(16799 632 627 0) 4533( 74 10000) | 76 2 74 | 20 744 Up CMS_T2_US_Florida_iogw1@v3_0@SDSC@gfactory-1.t2.ucsd.edu

On 01/22/2014 11:39 AM, Jeff Dost wrote:
Ok, I found more info,

I see the following in the factory info log:

[2014-01-22 11:28:18,705] INFO: Additional idle glideins not needed, have met request max_glideins limits 4, not submitting

And look here, after upgrade, Max requested, the red line, dropped from roughly #running, all the way down to 4!

http://gfactory-1.t2.ucsd.edu/osg_gfactory/factoryStatus.html?entry=CMS_T2_US_UCSD_gw6&frontend=UCSDCMS_cmspilot&infoGroup=running&elements=StatusRunning,ReqMaxGlideins,ClientGlideRunning,ClientGlideIdle,&rra=0&window_min=0&window_max=0&timezone=-8

Igor, correct me if I am wrong, but if I understand correctly, the frontend should be generating this "max requested" value, so I believe it is a problem on the frontend side.

Jeff

History

#1 Updated by Marco Mambelli over 5 years ago

  • Assignee changed from Parag Mhashilkar to Marco Mambelli

#2 Updated by Burt Holzman over 5 years ago

  • Priority changed from Normal to Low

#3 Updated by Parag Mhashilkar over 5 years ago

  • Target version changed from v3_2_4 to v3_2_5

#4 Updated by Parag Mhashilkar over 5 years ago

  • Target version changed from v3_2_5 to v3_2_6

#5 Updated by Burt Holzman over 5 years ago

  • Target version changed from v3_2_6 to v3_2_x

#6 Updated by Parag Mhashilkar almost 5 years ago

  • Target version changed from v3_2_x to v3_2_9

#7 Updated by Parag Mhashilkar over 4 years ago

  • Stakeholders updated (diff)

#8 Updated by Brian Bockelman over 4 years ago

  • Stakeholders updated (diff)

I'm not sure why this is marked as CMS & OSG.

OSG has never used this plugin. CMS quit using this plugin a few months ago.

Removing the OSG stakeholder.

#9 Updated by Parag Mhashilkar over 4 years ago

  • Target version changed from v3_2_9 to v3_2_x

#10 Updated by Parag Mhashilkar about 4 years ago

  • Target version changed from v3_2_x to v3_2_13

#11 Updated by Parag Mhashilkar over 3 years ago

  • Target version changed from v3_2_13 to v3_2_14

#12 Updated by Parag Mhashilkar over 3 years ago

  • Target version changed from v3_2_14 to v3_2_15

#13 Updated by Parag Mhashilkar over 3 years ago

  • Target version changed from v3_2_15 to v3_2_16

#14 Updated by Parag Mhashilkar about 3 years ago

  • Target version changed from v3_2_16 to v3_x

#15 Updated by Parag Mhashilkar about 3 years ago

  • Target version changed from v3_x to v3_2_x

#16 Updated by Marco Mambelli over 1 year ago

  • Target version changed from v3_2_x to v3_4_x

#17 Updated by Marco Mambelli about 1 year ago

  • Target version changed from v3_4_x to v3_5_x

#18 Updated by Marco Mambelli 15 days ago

  • Target version changed from v3_5_x to v3_6_x


Also available in: Atom PDF