Project

General

Profile

Bug #4483

FE in HA mode does not advertise to all collectors

Added by Igor Sfiligoi over 7 years ago. Updated over 7 years ago.

Status:
Closed
Priority:
High
Assignee:
Igor Sfiligoi
Category:
Frontend
Target version:
Start date:
08/01/2013
Due date:
% Done:

0%

Estimated time:
First Occurred:
Occurs In:
Stakeholders:
Duration:

Description

When using the FE in HA mode (i.e. multiple primary collectors), the FE does not sent the glideresource classads to all of them.
It just uses condor_advertize without the -pool option, resulting in a random pick between the available collectors.

History

#1 Updated by Parag Mhashilkar over 7 years ago

Are you sure your other collectors have frontend's correct dn in their mapfile and this is not a security issue?

There is no random pick here. The -pool is omitted in favor of collector_host in the frontend.condor_config

I can't reproduce this and works as expected when I try it. I also tried with two groups containing both primary and secondary collectors with same result. Let me know if I am missing something?

My frontend config section

   <collectors>
      <collector DN="/DC=com/DC=DigiCert-Grid/O=Open Science Grid/OU=Services/CN=fermicloud373.fnal.gov" group="default" node="fermicloud373.fnal.gov:9618" secondary="False"/>
      <collector DN="/DC=com/DC=DigiCert-Grid/O=Open Science Grid/OU=Services/CN=fermicloud373.fnal.gov" group="default" node="fermicloud373.fnal.gov:9619-9620" secondary="True"/>
      <collector DN="/DC=com/DC=DigiCert-Grid/O=Open Science Grid/OU=Services/CN=fermicloud343.fnal.gov" group="group2" node="fermicloud343.fnal.gov:9618" secondary="False"/>
   </collectors>

Equivalent frontend.condor_config after reconfiging the frontend

[frontend@fermicloud373 frontend_Frontend-branch_v2plus-v1_0]$ CONDOR_CONFIG=./frontend.condor_config condor_config_val collector_host
fermicloud373.fnal.gov:9618,fermicloud343.fnal.gov:9618

And upon starting the frontend I see glideresource classads in both collectors

MyType             TargetType         Name

glideresource      None               ITB_INSTALL_TEST_3@v1_0@GlideinFactory-br
Collector          None               User_Pool@fermicloud373.fnal.gov
Scheduler          None               fermicloud373.fnal.gov
DaemonMaster       None               fermicloud373.fnal.gov
Negotiator         None               fermicloud373.fnal.gov
glideresource      None               ress_ITB_GRATIA_TEST_1@v1_0@GlideinFactor
glideresource      None               ress_ITB_GRATIA_TEST_2@v1_0@GlideinFactor
glideresource      None               ress_ITB_GRATIA_TEST_3@v1_0@GlideinFactor
glideresource      None               ress_ITB_GRATIA_TEST_4@v1_0@GlideinFactor
glideresource      None               ress_ITB_GRATIA_TEST_5@v1_0@GlideinFactor

MyType             TargetType         Name

glideresource      None               ITB_INSTALL_TEST_3@v1_0@GlideinFactory-br
Collector          None               User_Pool@fermicloud343.fnal.gov
Scheduler          None               fermicloud343.fnal.gov
DaemonMaster       None               fermicloud343.fnal.gov
Negotiator         None               fermicloud343.fnal.gov
glideresource      None               ress_ITB_GRATIA_TEST_1@v1_0@GlideinFactor
glideresource      None               ress_ITB_GRATIA_TEST_2@v1_0@GlideinFactor
glideresource      None               ress_ITB_GRATIA_TEST_3@v1_0@GlideinFactor
glideresource      None               ress_ITB_GRATIA_TEST_4@v1_0@GlideinFactor
glideresource      None               ress_ITB_GRATIA_TEST_5@v1_0@GlideinFactor

#2 Updated by Igor Sfiligoi over 7 years ago

  • Assignee changed from Parag Mhashilkar to Igor Sfiligoi

It started to work the moment I hacked the code to explicitly call

condor_advertise -pool X

once for each collector.

However, if it works for you, then it begs a deeper investigation.
Will try to reproduce it on the command line.

BTW: What version of Condor are you using? My FE has 7.8.7.

#3 Updated by Parag Mhashilkar over 7 years ago

Very close to yours

[frontend@fermicloud373 frontend_Frontend-branch_v2plus-v1_0]$ condor_version
$CondorVersion: 7.8.8 Mar 20 2013 BuildID: 110288 $
$CondorPlatform: x86_64_rhap_6.3 $

But looking at https://htcondor-wiki.cs.wisc.edu/index.cgi/tktview?tn=3404 I think its safe to put the blame on this condor bug

#4 Updated by Parag Mhashilkar over 7 years ago

  • Status changed from New to Closed

I think we have enough evidence that this is related to condor version. I am going to close this ticket, feel free to reopen it if your investigations say otherwise.

#5 Updated by Igor Sfiligoi over 7 years ago

Just confirming that after upgrading to 8.0.1 the problem went away.

Also available in: Atom PDF