Inaccurate running pilot jobs number in glideresource classads
From: Brian Bockelman
Subject: GlideFactoryMonitor* entries in glideresource ads
Date: February 4, 2015 at 8:57:49 PM CST
To: Parag Mhashilkar
Cc: Jeff Dost, JAMES LETTS
I’ve been trying to make sense of the GlideFactoryMonitor* keys in the glideresource ads. Basically, for each CMS entry I look at in the OSG, the number of running jobs is a huge over-estimate.
After looking at the factory monitoring, the local CEs, and the frontend monitoring, I’ve figured it out - the GlideFactoryMonitor* is the totals for all frontends although it is in a per-group ad. So, for example,
condor_status -any -pool vocms097.cern.ch -l CMS_T2_US_Nebraska_Red_gw2@gfactory_instance@SDSC@CMSG-v1_0.main | sort
GlideClientMonitorGlideinsRunning = 516
GlideClientMonitorJobsRunningHere = 509
GlideFactoryMonitorStatusRunning = 748
The GlideClient* numbers are roughly correct - that’s the number of pilot and payload jobs running from the SDSC factory in the ‘main’ group of the CMSG frontend. However, GlideFactoryMonitor* is the number of all running pilots in the SDSC factory for this entry across all VOs. Hence, using the glideresource ads, it’s impossible to reconcile the three views for CMS (running pilots, running payloads, and the running htcondor-g jobs).
I’m pretty sure this is a bug (and an annoying one, as the user collector has no way of knowing the number of running jobs according to HTCondor-G). I propose a patch along the lines below.
--- a/frontend/glideinFrontendInterface.py +++ b/frontend/glideinFrontendInterface.py @@ -1284,15 +1284,9 @@ class ResourceClassad(classadSupport.Classad): @param info: Useful information from the glidefactoryclient classad """ - # Required keys do not start with TotalClientMonitor but only - # start with Total. Substitute Total with GlideFactoryMonitor - # and put it in the classad - for key in info.keys(): - if not key.startswith('TotalClientMonitor'): - if key.startswith('Total'): - ad_key = key.replace('Total', 'GlideFactoryMonitor', 1) - self.adParams[ad_key] = info[key] + if key.startswith('Status') or key.startswith('Requested'): + self.adParams['GlideFactoryMonitor' + key] = info[key] class ResourceClassadAdvertiser(classadSupport.ClassadAdvertiser):
#4 Updated by Parag Mhashilkar over 5 years ago
Ok so after more poking around, I found out that the Factory side Total monitoring info does not come from the glidefactoryclient classads at all, but from the glidefactory i.e. entry classads. I vaguely remember discussion that this info was enough as the other relevant info came from the frontend via glideclient classad.
We can get the info you need from the glidefactoryclient classads instead.