Frontend credential selection plugin ProxyUserMapWRecycling seems to be broken
On Jan 22, 2014, at 2:34 PM, Igor Sfiligoi wrote:
BTW: We were using ProxyUserMapWRecycling.
I switched to ProxyAll now, and things seem to work as expected there.
So this is less urgent that I originally though it was.
But the bug is real, and should be addressed.
Let me know i I should create a ticket for it, or if you are doing it.
On 01/22/2014 12:06 PM, Igor Sfiligoi wrote:
Hi Parag and Burt.
I think we have significant problems with the v3 FE for CMS AnaOps.
Looks like it is setting max_run to a ridiculously low number.
I think the bug is in
git blame glideinFrontendPlugins.py
d5fb79af (Doug Strain 2012-01-11 17:08:44 -0600 394) # Out of the max_run glideins,
d5fb79af (Doug Strain 2012-01-11 17:08:44 -0600 395) # Allocate proportionally out of the total jobs
3da1255c (Parag Mhashilkar 2013-05-30 14:23:22 -0500 396) if (params_obj is not None):
d5fb79af (Doug Strain 2012-01-11 17:08:44 -0600 397) this_max=self.num_user_jobs[user]*params_obj.max_run_glideins/self.total_jobs
d5fb79af (Doug Strain 2012-01-11 17:08:44 -0600 398) this_idle=self.num_user_jobs[user]*params_obj.min_nr_glideins/self.total_jobs
d5fb79af (Doug Strain 2012-01-11 17:08:44 -0600 399) if (this_max<=0):
d5fb79af (Doug Strain 2012-01-11 17:08:44 -0600 400) this_max=1
d5fb79af (Doug Strain 2012-01-11 17:08:44 -0600 401) if (this_idle<=0):
d5fb79af (Doug Strain 2012-01-11 17:08:44 -0600 402) this_idle=1
d5fb79af (Doug Strain 2012-01-11 17:08:44 -0600 403) cel['proxy'].add_usage_details(this_idle,this_max)
I.e. since CMS has a large number of users but only 10 pilot pilots, the above logic does not make sense.
Can you please have a quick look at it and either confirm or tell me I am completely off?
I have never had a look at that code before.
-------- Original Message --------
Subject: Re: [Cms-wms-support] Upgrading the AnaOps FE to v3
Date: Wed, 22 Jan 2014 11:50:53 -0800
From: Igor Sfiligoi <email@example.com>
To: Jeff Dost <firstname.lastname@example.org>
Uhm... looks like there are two values now:
 frontend@glidein-frontend ~$ condor_status -any 'CMS_T2_US_Florida_iogw1@v3_0@SDSC@UCSD-v6_0.main' -l |grep -i request |grep -v Expr
GlideClientMonitorGlideinsRequestIdle = 20
GlideClientMonitorGlideinsRequestMaxRun = 744
GlideFactoryMonitorRequestedIdle = 10
GlideFactoryMonitorRequestedMaxGlideins = 90
And, one makes sense, and the other doesn't!
Digging deeper now.
Here is the FE log
[2014-01-22 11:42:49,337] INFO: Jobs in schedd queues | Glideins | Request
[2014-01-22 11:42:49,338] INFO: Idle (match eff old uniq ) Run ( here max ) | Total Idle Run | Idle MaxRun Down Factory
[2014-01-22 11:42:49,418] INFO: 634(16799 632 627 0) 4533( 74 10000) | 76 2 74 | 20 744 Up CMS_T2_US_Florida_iogw1@v3_0@SDSC@gfactory-1.t2.ucsd.edu
On 01/22/2014 11:39 AM, Jeff Dost wrote:
Ok, I found more info,
I see the following in the factory info log:
[2014-01-22 11:28:18,705] INFO: Additional idle glideins not needed, have met request max_glideins limits 4, not submitting
And look here, after upgrade, Max requested, the red line, dropped from roughly #running, all the way down to 4!
Igor, correct me if I am wrong, but if I understand correctly, the frontend should be generating this "max requested" value, so I believe it is a problem on the frontend side.