Project

General

Profile

Bug #24334

Problem due to cached schedd value

Added by Marco Mambelli 2 months ago. Updated 2 months ago.

Status:
New
Priority:
High
Category:
-
Target version:
Start date:
04/21/2020
Due date:
% Done:

0%

Estimated time:
First Occurred:
Occurs In:
Stakeholders:
Duration:

Description

Due to a change in the DN of the certificate the Frontend configuration was changed, upgrade and reconfig done, but the Frontend kept failing because it was using the saved value of the schedd.
See below for the stack trace.
Manually deleting the cache content fixed the problem.

Whenever there is a reconfig (or upgrade) the cache should be cleared.

[2020-04-21 18:07:36,756] ERROR: glideinFrontendLib:1148: Condor Error. Failed to talk to schedd:
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/glideinwms/frontend/glideinFrontendLib.py", line 1144, in getCondorQConstrained
    condorq.load(full_constraint, format_list)
  File "/usr/lib/python2.7/site-packages/glideinwms/lib/condorMonitor.py", line 561, in load
    self.stored_data = self.fetch(constraint, format_list)
  File "/usr/lib/python2.7/site-packages/glideinwms/lib/condorMonitor.py", line 600, in fetch
    format_list=format_list)
  File "/usr/lib/python2.7/site-packages/glideinwms/lib/condorMonitor.py", line 489, in fetch
    format_list=format_list)
  File "/usr/lib/python2.7/site-packages/glideinwms/lib/condorMonitor.py", line 635, in fetch_using_bindings
    results = schedd.query(constraint, attrs)
QueryError: Error executing htcondor query to pool None with constraint ((JobStatus=?=1)||(JobStatus=?=2)) && (((JobUniverse==5)&&(GLIDEIN_Is_Monitor =!= TRUE)&&(JOB_Is_Monitor =!= TRUE)) && (True)) and format_list [('RequestCpus', 'i'), ('x509UserProxyFirstFQAN', 's'), ('x509UserProxyFQAN', 's'), ('x509userproxy', 's'), ('JobStatus', 'i'), ('EnteredCurrentStatus', 'i'), ('ServerTime', 'i'), ('RemoteHost', 's'), ('ClusterId', 'i'), ('ProcId', 'i')]: Error querying schedd fermicloud315.fnal.gov in pool default using python bindings: Failed to fetch ads from schedd, errmsg=SECMAN:2007:Failed to end classad message.. Env is {'LANG': 'en_US.UTF-8', 'X509_USER_PROXY': '/etc/gwms-frontend/fe_proxy', 'SHELL': '/bin/bash', 'SHLVL': '2', 'X509_CERT_DIR': '/etc/grid-security/certificates', 'PWD': '/', 'LOGNAME': 'frontend', 'USER': 'frontend', 'HOME': '/var/lib/gwms-frontend', 'PATH': '/sbin:/usr/sbin:/bin:/usr/bin', '_CONDOR_CERTIFICATE_MAPFILE': '/var/lib/gwms-frontend/vofrontend/group_main/group.mapfile', 'CONDOR_CONFIG': '/var/lib/gwms-frontend/vofrontend/frontend.condor_config', '_': '/usr/sbin/glideinFrontend'}
[2020-04-21 18:07:36,906] INFO: glideinFrontendElement:410: All children terminated

History

#1 Updated by Marco Mascheroni 2 months ago

The clearing of the cache upon restart should be already there: https://github.com/glideinWMS/glideinwms/commit/17dd5e21c9ae919e1e55e43effc24f68680b4647#diff-25c1b48a883815579655163d017926f8R455-R467

Not sure why this did not work.

#2 Updated by Marco Mascheroni 2 months ago

  • Assignee changed from Marco Mascheroni to Marco Mambelli


Also available in: Atom PDF