Project

General

Profile

Bug #4549

v3 factory glideins are not reporting to all collectors specified by the v2 frontend

Added by Krista Larson over 7 years ago. Updated over 7 years ago.

Status:
Closed
Priority:
Urgent
Assignee:
Parag Mhashilkar
Category:
-
Target version:
Start date:
08/16/2013
Due date:
% Done:

0%

Estimated time:
First Occurred:
Occurs In:
Stakeholders:
Duration:

Description

I have multiple collectors specified in my v2 frontend:
<collectors>
<collector DN="/DC=ch/DC=cern/OU=computers/CN=vocms97.cern.ch" group="default" node="vocms97.cern.ch" secondary="False"/>
<collector DN="/DC=ch/DC=cern/OU=computers/CN=vocms97.cern.ch" group="default" node="vocms97.cern.ch:9620-10019" secondary="True"/>
<collector DN="/DC=org/DC=doegrids/OU=Services/CN=cmssrv119.fnal.gov" group="fnal" node="cmssrv119.fnal.gov:9618" secondary="False"/>
<collector DN="/DC=org/DC=doegrids/OU=Services/CN=cmssrv119.fnal.gov" group="fnal" node="cmssrv119.fnal.gov:9620-10049" secondary="True"/>
<collector DN="/DC=org/DC=doegrids/OU=Services/CN=cmssrv119.fnal.gov" group="fnal" node="cmssrv119.fnal.gov:10051-10119" secondary="True"/>
</collectors>

When using a v2 factory glidein, I see both collectors in the glidein condor configs and they report to both collectors:
HEAD_NODE=vocms97.cern.ch:9828,cmssrv119.fnal.gov:9828
COLLECTOR_HOST = $(HEAD_NODE)
CCB_ADDRESS=vocms97.cern.ch:9828,cmssrv119.fnal.gov:9828

In a v3 factory glidein, only one collector is specified:
GLIDEIN_Collector cmssrv119.fnal.gov:9855
HEAD_NODE=cmssrv119.fnal.gov:9855
COLLECTOR_HOST = $(HEAD_NODE)
CCB_ADDRESS cmssrv119.fnal.gov:9855

History

#1 Updated by Parag Mhashilkar over 7 years ago

  • Status changed from New to Assigned
  • Assignee set to Parag Mhashilkar
  • Target version set to v3_2

#2 Updated by Igor Sfiligoi over 7 years ago

  • Priority changed from Normal to Urgent

CMS AnaOps relies on this functionality in the current v2 production factories.

So it is a showstopper for the deployment of v3.

#3 Updated by Parag Mhashilkar over 7 years ago

This bug seems to be from either, err classad reporting or from the OSG test results reporting. As you can see from below, main/collector_setup.sh is executed twice. During first invocation it does the right thing, where in it reads in semi-colon separated collector groups and writes the Collectors separated by comma. During second iteration, it uses this comma separated list and just picks one as expected. Now back to figuring out why this is happening (possibly variable name collision/scoping) and I am getting close.

<OSGTestResult id="glidein_startup.sh" version="4.3.1">
    <OSGTestResults>
      <OSGTestResult id="main/setup_script.sh" version="4.3.1">
      </OSGTestResult>
      <OSGTestResult id="main/cat_consts.sh" version="4.3.1">
      </OSGTestResult>
      <OSGTestResult id="main/condor_platform_select.sh" version="4.3.1">
      </OSGTestResult>
      <OSGTestResult id="main/collector_setup.sh" version="4.3.1">
      </OSGTestResult>
      <OSGTestResult id="main/create_temp_mapfile.sh" version="4.3.1">
      </OSGTestResult>
      <OSGTestResult id="main/setup_x509.sh" version="4.3.1">
      </OSGTestResult>
      <OSGTestResult id="client/cat_consts.sh" version="4.3.1">
      </OSGTestResult>
      <OSGTestResult id="client/check_blacklist.sh" version="4.3.1">
      </OSGTestResult>
      <OSGTestResult id="client_group/cat_consts.sh" version="4.3.1">
      </OSGTestResult>
      <OSGTestResult id="client_group/check_blacklist.sh" version="4.3.1">
      </OSGTestResult>
      <OSGTestResult id="entry/cat_consts.sh" version="4.3.1">
      </OSGTestResult>
      <OSGTestResult id="entry/check_blacklist.sh" version="4.3.1">
      </OSGTestResult>
      <OSGTestResult id="main/check_proxy.sh" version="4.3.1">
      </OSGTestResult>
      <OSGTestResult id="main/create_mapfile.sh" version="4.3.1">
      </OSGTestResult>
      <OSGTestResult id="main/validate_node.sh" version="4.3.1">
      </OSGTestResult>
      <OSGTestResult id="main/gcb_setup.sh" version="4.3.1">
      </OSGTestResult>
      <OSGTestResult id="main/glexec_setup.sh" version="4.3.1">
      </OSGTestResult>
      <OSGTestResult id="main/java_setup.sh" version="4.3.1">
      </OSGTestResult>
      <OSGTestResult id="main/glidein_memory_setup.sh" version="4.3.1">
      </OSGTestResult>
      <OSGTestResult id="main/collector_setup.sh" version="4.3.1">
      </OSGTestResult>
      <OSGTestResult id="main/glidein_cpus_setup.sh" version="4.3.1">
      </OSGTestResult>
      <OSGTestResult id="condor_startup.sh" version="4.3.1">
      </OSGTestResult>
    </OSGTestResults>
</OSGTestResult>

#4 Updated by Parag Mhashilkar over 7 years ago

  • Occurs In v3_1 added

So it was neither of err classad reporting or from the OSG test results reporting, but a bad merge introduced in v3_1_alpha3

#5 Updated by Parag Mhashilkar over 7 years ago

  • Status changed from Assigned to Closed

Fixed, tested, code reviewed, merged the changes to master.

Also available in: Atom PDF