Project

General

Profile

Bug #23768

Bug in multiple credential distribution

Added by Steven Timm 12 months ago. Updated 20 days ago.

Status:
New
Priority:
High
Category:
-
Target version:
Start date:
12/15/2019
Due date:
% Done:

0%

Estimated time:
First Occurred:
Occurs In:
Stakeholders:

HEPCloud

Duration:

Description

On system gpfrontend01 (currently running gwms 3.4.5, and OSG production factory gfactory-2.opensciencegrid.org, also running 3.4.5 we see the following problem:

We have defined on the frontend a group group_nova with two different credential definitions. (whole definition of the group is attached below). One is grid_proxy only and
the other is type grid_proxy+project_id.

The factory query_expr of this group matches six entries in the factory, five of which require credential type grid_proxy and one (NOVA_US_OSC_osg_condor) which requires
]a grid_proxy+project_id

What we observe is that when we have both credentials enabled, we get an error

2019-12-04 15:41:25,966] WARNING: glideinFrontendInterface:1002: No security credentials match for factory pool gfactory-2.opensciencegrid.org, not advertising request; if this is not intentio
nal, check for typos frontend's credential trust_domain and type, vs factory's pool trust_domain and auth_method

This error comes only when trying to advertise credentials to NOVA_US_OSG_osg_condor entry.
The other entries have been working well for months.

If we comment out the grid_proxy only credential, effectively sending a grid_proxy+project_id
credential to all six entries, then everything works ok, but at the cost at sending the extra credential to five
entries where it is nod needed.


</group>
<group name="nova" enabled="True">
<config ignore_down_entries="">
<glideins_removal margin="0" requests_tracking="False" type="NO" wait="0"/>
<idle_glideins_lifetime max="0"/>
<idle_glideins_per_entry max="100" reserve="5"/>
<idle_vms_per_entry curb="100" max="110"/>
<idle_vms_total curb="200" max="1000"/>
<processing_workers matchmakers="3"/>
<running_glideins_per_entry max="10000" min="0" relative_to_queue="1.0"/>
<running_glideins_total curb="90000" max="100000"/>
</config>
<match match_expr='not job.has_key("RequestGPUs")' start_expr='stringListIMember("group_nova",TARGET.AccountingGroup,".") && isUndefined(TARGET.RequestGPUs) && isUndefi
ned(TARGET.JobFactoryType)'>
<factory query_expr='stringListIMember("Nova", GLIDEIN_Supported_VOs) && (GLIDEIN_Site =!= "FNAL") && (FactoryType =?= "production")'>
<match_attrs>
</match_attrs>
<collectors>
</collectors>
</factory>
<job query_expr='stringListIMember("group_nova", AccountingGroup,".") && isUndefined(RequestGPUs) && isUndefined(JobFactoryType)'>
<match_attrs>
</match_attrs>
<schedds>
</schedds>
</job>
</match>
<security>
<credentials>
<credential absfname="/etc/gwms-frontend/proxies/nova_proxy" project_id="PES0665" security_class="frontend" trust_domain="grid" type="grid_proxy+project_id"/>
<!-- It should have worked to have both types of credential in this group but we are getting an error at the moment. sending the grid_proxy+project_id to all nova resources until we figure it
out
<credential absfname="/etc/gwms-frontend/proxies/nova_proxy" security_class="frontend" trust_domain="grid" type="grid_proxy"/>
-->
</credentials>
</security>
<attrs>
</attrs>
<files>
</files>
</group>
-------------------------------
<entry name="Nova_US_OSC_osg_condor" auth_method="grid_proxy+project_id" comment="Added for Nova 2017 ggus 33565--VG Note this site requires TMPDIR" enabled="True" gatekeeper="osg-ce.hpc.
osc.edu osg-ce.hpc.osc.edu:9619" gridtype="condor" trust_domain="grid" verbosity="std" work_dir="TMPDIR">
<config>
<max_jobs>
<default_per_frontend glideins="5000" held="50" idle="100"/>
<!--per_entry comment="admins want 1k max at site 2013-11-06 Jeff" glideins="1000" held="100" idle="200" ->
<per_entry comment="cap until Ken can check on unmatching" glideins="2" held="1" idle="1"/>
<per_frontends>
</per_frontends>
</max_jobs>
<release max_per_cycle="20" sleep="0.2"/>
<remove max_per_cycle="5" sleep="0.2"/>
<restrictions require_glidein_glexec_use="False" require_voms_proxy="False"/>
<submit cluster_size="10" max_per_cycle="10" sleep="2" slots_layout="fixed">
<submit_attrs>
<submit_attr name="+maxMemory" value="4096"/>
<submit_attr name="+maxWallTime" value="1440"/>
<submit_attr name="+xcount" value="1"/>
<submit_attr name="+queue" value=""batch""/>
</submit_attrs>
</submit>
</config>
<allow_frontends>
</allow_frontends>
<attrs>
<attr name="GLEXEC_BIN" const="True" glidein_publish="False" job_publish="False" parameter="True" publish="True" type="string" value="NONE"/>
<attr name="GLIDEIN_CPUS" const="True" glidein_publish="False" job_publish="True" parameter="True" publish="True" type="string" value="1"/>
<attr name="GLIDEIN_MaxMemMBs" const="True" glidein_publish="True" job_publish="False" parameter="True" publish="True" type="int" value="4096"/>
<attr name="GLIDEIN_Country" const="True" glidein_publish="True" job_publish="True" parameter="True" publish="True" type="string" value="US"/>
<attr name="GLIDEIN_Max_Walltime" const="True" glidein_publish="False" job_publish="False" parameter="True" publish="True" type="int" value="72000"/>
<attr name="GLIDEIN_REQUIRED_OS" const="True" glidein_publish="True" job_publish="False" parameter="True" publish="True" type="string" value="any"/>
<attr name="GLIDEIN_ResourceName" const="True" glidein_publish="True" job_publish="True" parameter="True" publish="True" type="string" value="OSC_OSG"/>
<attr name="GLIDEIN_Site" const="True" glidein_publish="True" job_publish="True" parameter="True" publish="True" type="string" value="OSC"/>
<attr name="GLIDEIN_Supported_VOs" comment="RESTRICTED SITE: All VOs must have an allocation provided by OSC to run here. Please contact either Doug Johnson () or Tr
oy Baer () before adding VO to the list." const="True" glidein_publish="False" job_publish="False" parameter="True" publish="True" type="string" value="Nova"/>
<attr name="UPDATE_COLLECTOR_WITH_TCP" const="True" glidein_publish="False" job_publish="False" parameter="True" publish="False" type="string" value="True"/>
</attrs>
<files>
</files>
<infosys_refs>
<infosys_ref ref="GlueCEUniqueID=osg.osc.edu:2119/jobmanager-pbs-batch,Mds-Vo-name=OSC_OSG,Mds-Vo-name=local,o=grid" server="is.grid.iu.edu" type="BDII"/>
</infosys_refs>
<monitorgroups>
</monitorgroups>
</entry>


Related issues

Related to GlideinWMS - Feature #24165: Refactor credential handlingNew03/10/2020

History

#1 Updated by Marco Mambelli 12 months ago

  • Target version set to v3_6_3
  • Assignee set to Marco Mambelli

#2 Updated by Marco Mambelli 7 months ago

  • Target version changed from v3_6_3 to v3_6_4

#3 Updated by Marco Mambelli 2 months ago

  • Target version changed from v3_6_4 to v3_6_5

#4 Updated by Marco Mambelli about 2 months ago

  • Target version changed from v3_6_5 to v3_6_6

#5 Updated by Marco Mambelli 20 days ago

  • Stakeholders updated (diff)

#6 Updated by Marco Mambelli 20 days ago

  • Priority changed from Normal to High

#7 Updated by Marco Mambelli 18 days ago

Also available in: Atom PDF