Project

General

Profile

Bug #6049

Upgrading 3.2.3 to 3.2.4 may have compatibility issues

Added by Parag Mhashilkar about 6 years ago. Updated about 6 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Parag Mhashilkar
Category:
-
Target version:
Start date:
04/28/2014
Due date:
% Done:

0%

Estimated time:
First Occurred:
Occurs In:
Stakeholders:
Duration:

Description

Hello glideinWMS support,

I have two separate bugs that bit me on upgrading our test frontend that I would like to report.

First, over the weekend after upgrading the frontend to vanilla v3_2_4, we ran out of disk space. This was my fault, but once cleaning up the disk, I was unable to start the frontend back up. It would fail with:
[2014-04-26 11:13:32,033] ERROR: glideinFrontend:353: Exception occurred trying to spawn:
Traceback (most recent call last):
File "/home/frontend/glideinwms/frontend/glideinFrontend.py", line 349, in main
restart_interval, restart_attempts)
File "/home/frontend/glideinwms/frontend/glideinFrontend.py", line 252, in spawn
failure_dict, restart_attempts)
File "/home/frontend/glideinwms/frontend/glideinFrontend.py", line 146, in spawn_iteration
failure_dict[group_name].add_failure()
File "/home/frontend/glideinwms/frontend/glideinFrontend.py", line 55, in add_failure
if when in None:
TypeError: iterable argument required

Poking around the code, it appears there is a syntax error:
def add_failure(self, when=None):
if when in None:
when = time.time()

(in operator doesn't work on None)

The only way to recover my frontend was to roll back to v3_2_3, then upgrade again to v3_2_4.

Now, the second bug is that, in rolling back to v3_2_3, I tried running the frontend while the GOC ITB factory is at v3_2_4. Unfortunately it appears v3_2_3 frontends can't successfully send glidein requests to v3_2_4 factories!

On the factory side, I could see the global frontend classad, doing:
condor_status -any -const 'mytype=?="glideclientglobal"'

but this returned empty:
condor_status -any -const 'mytype=?="glideclient"'

It wasn't until upgrading back to v3_2_4 again that the glideclient ads started appearing again in the factory collector.

This was vanilla glideinWMS v3_2_3 and v3_2_4 on the frontend side, and on GOC-ITB, we are currently running v3_2_4 plus this HTCondorCE patch:
https://cdcvs.fnal.gov/redmine/issues/5956

Jeff

History

#1 Updated by Parag Mhashilkar about 6 years ago

After spending some time and debugging the issue on the ITB factories and test frontend, it seems to be a bug with the v3.2.3 frontend and few earlier versions. I did not do detailed analysis since the 3.2.4 frontend works. Also 3.2.4 involves several code changes and may have the issue resolved.

The issue happens when the frontend cannot find credentials that can be used for a given entry (like different trust_domain). In ITB case, cloud entries had trust_domain="Grizzly_HLT" but the frontend was not configured with any credentials that could be used with them. Instead of just ignoring these entries, frontend just logged the message about "No credentials found ...." and did not advertise glideclient classads for valid entries.

To fix the issue either disable the entries in factory or add credentials in the frontend. Since the production factories do not have entries with a mix of trust_domains, this should not affect them. Hopefully until then all the frontends are upgraded to 3.2.4+

#2 Updated by Parag Mhashilkar about 6 years ago

  • Status changed from New to Resolved

Merged relevant changes to branch_v3_2

#3 Updated by Parag Mhashilkar about 6 years ago

  • Status changed from Resolved to Closed


Also available in: Atom PDF