Project

General

Profile

Bug #24160

Submission to GCE broken

Added by Marco Mambelli 7 months ago. Updated 6 months ago.

Status:
Closed
Priority:
Urgent
Category:
-
Target version:
Start date:
03/10/2020
Due date:
% Done:

0%

Estimated time:
First Occurred:
Occurs In:
Stakeholders:
Duration:

Description

The submission to GCE is broken because the Factory is parring to condor the simple x509 credential instead of a key=value file

Here the email from Steve:

We are running hepcloud production with htcondor 8.9.5 and glideinwms factory 3.6.2.

We have observed the following:

The glidein_startup.sh script on the GCE VM is looking for two values in the instance metadata

        self.userdata_attributes = (
            'glideinwms_metadata',
            'glidein_credentials'
        )

It fails because it cannot find the second one glidein_credentials.

Analysis of the call that is being made to google shows that only the first one,
glideinwms_metadata is being sent, correctly.

Below is the job.condor file that is created

You can see that it is using both the gce_metadata and gce_metadata_file options
in the condor submit. 

The problem is that the gce_metadata_file is supposed to have
a key=value structure, and as submitted it does not.  We are just appending the gzipped proxy directly as
the file name.  therefore it appears that the proxy is just getting ignored and not getting sent as part of the metadata.
This causes the glideinwms-pilot launcher on the VM to error out and fail.

A quick examination of the AWS job.condor shows that exactly the same thing is happening there.

Please advise.

Steve Timm

glideFactoryCredentials.py (17.3 KB) glideFactoryCredentials.py Marco Mambelli, 03/11/2020 11:48 AM

History

#1 Updated by Marco Mambelli 7 months ago

  • Priority changed from Normal to Urgent
  • Assignee set to Bruno Coimbra
  • Status changed from New to Feedback

In the single-user factory update the change of format of the compressed credentials done in creation/web_base/update_proxy.p had been overlooked.
Changes are in v36/24160

#2 Updated by Marco Mambelli 7 months ago

  • Target version set to v3_6_2

#3 Updated by Marco Mambelli 7 months ago

  • File glideFactoryCredentials.py added

Patching instructions:
VERSIONS: GlideinWMS v3_6, v3_6_1
APPLICABLE TO: Factory
OS: RHEL6, RHEL7 and compatibles

FILES: glideFactoryCredentials.py (attached to this ticket)

EFFECT: Changes in the code are very limited, it will affect only the compressed credential that is used for AWS and GCE, fixing the problem.

INSTRUCTIONS
  1. stop the Factory
  2. In the python site-packages directory, glideinwms/factory subdirectory (/usr/lib/python2.6/site-packages/glideinwms/factory/ on RHEL6, /usr/lib/python2.7/site-packages/glideinwms/factory/ on RHEL7) do the following:
    1. replace glideFactoryCredentials.py with the prowided one
    2. remove glideFactoryCredentials.pyc glideFactoryCredentials.pyo
  3. start the Factory

#4 Updated by Marco Mambelli 7 months ago

  • Assignee changed from Bruno Coimbra to Marco Mambelli
  • Status changed from Feedback to Resolved

#5 Updated by Marco Mambelli 7 months ago

#6 Updated by Steven Timm 7 months ago

The patch successfully got credentials to the google VMs and glideins were able to complete initiation, call back to the pool, match and run jobs.

#7 Updated by Marco Mambelli 6 months ago

HEPCloud

#8 Updated by Marco Mambelli 6 months ago

  • Status changed from Resolved to Closed


Also available in: Atom PDF