Factory monitoring broken - significantly scaled up
The CMS AnaOps has noticed that the FE monitoring of running glideins is way off;
to be specific, 10x lower than what the factory is reporting.
Since the CMS AnaOps FE uses 10 pilot proxies we suspect that's the root cause, but have no certainty.
#3 Updated by Igor Sfiligoi about 6 years ago
- Subject changed from Frontend monitoring broken - significantly scaled down to Factory monitoring broken - significantly scaled up
- Category changed from Frontend Monitoring to Factory Monitoring
looking at the condor_status, looks like the factory numbers are the ones that are 10x what they should be!
#8 Updated by Marco Mambelli about 6 years ago
- Assignee changed from Marco Mambelli to Parag Mhashilkar
Reassigning it back to Parag, it's ready to be merged.
PS I checked self.trust_domain in the Credential in frontend/glideinFrontendInterface.py. It's OK to use the string "None". It is compared with string values written in the xml config file, which may include "None" and "Any" beside the actual trust domain.
#12 Updated by Igor Sfiligoi almost 6 years ago
- Status changed from Closed to Assigned
- Target version changed from v3_2_4 to v3_2_5
CMS just upgraded to 3_2_5 and half of the monitoring is now completely broken!
In particular, all glidein related numbers are advertised as 0.
(PS: we also upgraded the factroy to 3_2_5, no difference)
GlideinMonitorGlideinsIdle = 0
GlideinMonitorGlideinsRunning = 0
GlideinMonitorGlideinsTotal = 0
GlideinMonitorRunningHere = 0
Reopening this ticket.