Project

General

Profile

Feature #15892

SUBSYS.LOCALNAME.* warning triggered by GWMS htcondor configuration

Added by Marco Mambelli over 3 years ago. Updated over 3 years ago.

Status:
Closed
Priority:
High
Category:
Integration with Condor
Target version:
Start date:
03/16/2017
Due date:
% Done:

0%

Estimated time:
Stakeholders:
Duration:

Description

With htcondor 8.5 there is a warning when using the subsystem together with the local name.
This is triggered by files like /etc/condor/config.d/02_gwms_factory_schedds.config (see below)

The solution is to use only the local name.

[root@cmssrv280 spool]# condor_config_val | more
WARNING: the following appear to be obsolete SUBSYS.LOCALNAME.* overrides
 SCHEDD.SCHEDDGLIDEINS2.EXECUTE at /etc/condor/config.d/02_gwms_factory_schedds.config, line 91
 SCHEDD.SCHEDDGLIDEINS2.JOB_QUEUE_LOG at /etc/condor/config.d/02_gwms_factory_schedds.config, line 95
 SCHEDD.SCHEDDGLIDEINS2.LOCAL_DIR at /etc/condor/config.d/02_gwms_factory_schedds.config, line 90
 SCHEDD.SCHEDDGLIDEINS2.LOCK at /etc/condor/config.d/02_gwms_factory_schedds.config, line 92
 SCHEDD.SCHEDDGLIDEINS2.PROCD_ADDRESS at /etc/condor/config.d/02_gwms_factory_schedds.config, line 93
 SCHEDD.SCHEDDGLIDEINS2.SCHEDD_ADDRESS_FILE at /etc/condor/config.d/02_gwms_factory_schedds.config, line 96
…...
   Use of both SUBSYS. and LOCALNAME. prefixes at the same time is not needed and not supported.
   To override config for a class of daemons, or for a standard daemon use a SUBSYS. prefix.
   To override config for a specific member of a class of daemons, just use a LOCALNAME. prefix like this:
 SCHEDDGLIDEINS2.EXECUTE
 SCHEDDGLIDEINS2.JOB_QUEUE_LOG
 SCHEDDGLIDEINS2.LOCAL_DIR
 SCHEDDGLIDEINS2.LOCK
 SCHEDDGLIDEINS2.PROCD_ADDRESS
 SCHEDDGLIDEINS2.SCHEDD_ADDRESS_FILE
 SCHEDDGLIDEINS2.SCHEDD_DAEMON_AD_FILE
 SCHEDDGLIDEINS2.SCHEDD_EXPRS
 SCHEDDGLIDEINS2.SCHEDD_LOG
 SCHEDDGLIDEINS2.SCHEDD_NAME
 SCHEDDGLIDEINS2.SPOOL
….
frontend_schedd_patch.tar (9.5 KB) frontend_schedd_patch.tar Marco Mambelli, 05/31/2017 09:38 AM
factory_patch.tar (12 KB) factory_patch.tar Marco Mambelli, 05/31/2017 09:38 AM

History

#1 Updated by Marco Mambelli over 3 years ago

  • Status changed from New to Feedback
  • Assignee changed from Marco Mambelli to Parag Mhashilkar

I looked all the condor config files (templates), the util to add schedds and bumped the HTCondor requirement to 8.4.0 per Zach's comment that this syntax is guaranteed in 8.4 and following and he's not sure about earlier versions.
Changes are in v3/15892

#2 Updated by Marco Mambelli over 3 years ago

  • Assignee changed from Parag Mhashilkar to Dennis Box

This will be also in v3.3.2

#3 Updated by Dennis Box over 3 years ago

Question: does the code in install/services/Condor.py ever get called to configure schedds, or is it obsolete?

method def condor_config_secondary_schedd_data(self): about line 1076 contains

self.condor_config_data[type] +=  """ 
s = $(SCHEDD)
(upper_name)s_ARGS = -local-name s
SCHEDD.
(upper_name)s.SCHEDD_NAME = s
SCHEDD.
(upper_name)s.SCHEDD_LOG = $(LOG)/SchedLog.$(SCHEDD.%(upper_name)s.SCHEDD_NAME)
SCHEDD.%(upper_name)s.LOCAL_DIR = s/$(SCHEDD.(upper_name)s.SCHEDD_NAME)
SCHEDD.%(upper_name)s.EXECUTE = $(SCHEDD.%(upper_name)s.LOCAL_DIR)/execute
SCHEDD.%(upper_name)s.LOCK = $(SCHEDD.%(upper_name)s.LOCAL_DIR)/lock

#4 Updated by Dennis Box over 3 years ago

The web documentation also needs to be updated in doc/components/condor.html

SCHEDD.SCHEDDGLIDEINS<b><font color="red">2</font></b>.SCHEDD_NAME = schedd_glideins<b><font color="red">2</font></b><br/>
SCHEDD.SCHEDDGLIDEINS<b><font color="red">2</font></b>.SCHEDD_LOG = $(LOG)/SchedLog.$(SCHEDD.SCHEDDGLIDEINS<b><font color="red">2</font></b>.SCHEDD_NAME)<br/>
SCHEDD.SCHEDDGLIDEINS<b><font color="red">2</font></b>.LOCAL_DIR = $(LOCAL_DIR)/$(SCHEDD.SCHEDDGLIDEINS<b><font color="red">2</font></b>.SCHEDD_NAME)<br/>

#5 Updated by Dennis Box over 3 years ago

Looks good to me after additional file changes.

#6 Updated by Marco Mambelli over 3 years ago

  • Status changed from Feedback to Assigned
  • Assignee changed from Dennis Box to Marco Mambelli

The LOCALNAME.* configuration is causing errors and the daemon (startd) is not starting.
I reverted the changes to merge this branch and undo the changes and am following up with the condor team for advise.

#7 Updated by Marco Mambelli over 3 years ago

To test use:

 condor_config_val   -raw   -host $(hostname -s) SCHEDD.SCHEDDGLIDEINS2.SCHEDD_LOG

But if you DON'T use -raw, you will get an answer that is not correct for SCHEDDGLIDEINS2 because it will be expanded with
SUBSYSTEM=TOOL
LOCALNAME=

To find out where the SCHEDDGLIDEINS2 will put the log file, you need to lookup and expand in the same way, using
SUBSYSTEM=SCHEDD
LOCALNAME=SCHEDDGLIDEINS2

To get the substitutions there are two ways to do this - either use
condor_config_val -schedd -name
to ask SCHEDDGLIDEINS2 for the value it will use.

Or use:

condor_config_val -subsystem SCHEDD -local-name SCHEDDGLIDEINS2 -host $(hostname -s) SCHEDDGLIDEINS2.SCHEDD_LOG

-subsystem and -localname are like -host. The force condor_config_val to use the given values for SUBSYSTEM and LOCALNAME while parsing that config file than it would normally use.

#8 Updated by Marco Mambelli over 3 years ago

From TJ. There seem to be a (new) bug in condor_config_val.

Taking a closer look, there is a bug here that I'm sure is part of the confusion. 

LOG = $(LOCAL_DIR)/log/condor
# at: /etc/condor/condor_config, line 60
# expanded: /var/log/condor
# default: $(LOCAL_DIR)/log

In the dump from gli2host, condor_config_val is reporting the wrong expanded value for LOG.  Because LOCAL_DIR in this file is defined as

LOCAL_DIR = $(LOCAL_SCHEDD_DIR)/$(SCHEDD.SCHEDDGLIDEINS2.SCHEDD_NAME)
# at: /etc/condor/config.d/02_gwms_factory_schedds.config, line 90
# expanded: /var/lib/condor/schedd_glideins2

This is a bug on condor_config_val.

And about changes in different version that fixed a previous bug:

So the crux of the issue is this:

From the primary config we have LOCAL_DIR defined as:

 LOCAL_DIR = /var
  # at: /etc/condor/condor_config, line 26
  # expanded: /var
  # default: $(TILDE)

For the gli2 config we have LOCAL_DIR defined as

 LOCAL_DIR = $(LOCAL_SCHEDD_DIR)/$(SCHEDD.SCHEDDGLIDEINS2.SCHEDD_NAME)
  # at: /etc/condor/config.d/02_gwms_factory_schedds.config, line 90
  # expanded: /var/lib/condor/schedd_glideins2

In all of the configs LOG is defined as

 LOG = $(LOCAL_DIR)/log/condor
  # at: /etc/condor/condor_config, line 60
  # expanded: /var/log/condor
  # default: $(LOCAL_DIR)/log

So when LOG is expanded by gli2  it gets expanded as 
/var/lib/condor/schedd_glideins2/log

But when it is expanded by other daemons, (or condor_config_val without the -local-name argument) it expands as
/var/log/condor

It was a long standing bug prior to 8.5 that when a $() expansion was used in another knob definition, the SUBSYS or LOCALNAME overrides were ignored.  Or in other words, prior to 8.5 the expansion of $(LOG) would ignore a SUBSYS or LOCALNAME override of LOCAL_DIR.

That bug was fixed in 8.5 which is why this config used to do something different.  
(I hesitate to use say it "worked" because the config was actually depending on a bug)

If you want to override LOCAL_DIR for gli2, but want the log files to go into the same log directory as regular daemons, then you have to override LOG as well, or you have to change the configuration of SCHEDDGLIDEINS2.SCHEDD_LOG so that it doesn't refer to $(LOG).

-tj

#9 Updated by Marco Mambelli over 3 years ago

  • Status changed from Assigned to Work in progress

waiting for feedback form the condor team

#10 Updated by Marco Mambelli over 3 years ago

RPM spec files, install/services/init_schedd.sh, HTCondor config files and documentation are consistent.
[#16702] has been created because scripts install/glidecondor_createSecCol and install/glidecondor_createSecSched are inconsistent with the rest. These 2 were already inconsistent and can be fixed in a different ticket.

Changes are in v3/15892_2 and resolve also [#16435]

#11 Updated by Marco Mambelli over 3 years ago

  • Status changed from Work in progress to Feedback
  • Assignee changed from Marco Mambelli to Parag Mhashilkar

#12 Updated by Parag Mhashilkar over 3 years ago

  • Assignee changed from Parag Mhashilkar to Marco Mambelli
  • In glideinwms.spec file, any reason why we need to keep other components to be >=8.2.3? Or can we change all HTCondor dependencies in the glideinwms.spec to >= v8.4. This may require changes to other subsystem configs too. Also, I am assuming you also changed the spec file in the OSG repo.
  • In 02_gwms_factory_schedds.config you say "# SPOOLSCHEDDGLIDEINS2_... is something not picked up, no need to keep it" I am assuming you mean SCHEDDGLIDEINS2_, in that case why do we need both below?

SCHEDDGLIDEINS2_ARGS = -local-name scheddglideins2
SCHEDDGLIDEINS2.SCHEDD_NAME = schedd_glideins2

Also, I remember that SCHEDDJOBSX_SPOOL_DIR_STRING was used for some efficiency reasons in past, so just need to make sure that its accounted for.

#13 Updated by Marco Mambelli over 3 years ago

No special reason not to change other condor dependencies, set all to 8.4.0

There was a typo in the comment:
SCHEDDGLIDEINS2_SPOOL_DIR_STRING is not picked up (I tried changes and are ignored). The value of SPOOL_DIR_STRING comes from SPOOL_DIR_STRING="$(SPOOL)"
above in the file and is the desired one. I checked, the file it is used in is the lib/condorMonitor.py where current SPOOL_DIR_STRING works fina and I fixed a bug that could happen in case LOCAL_DIR_STRING is defined (commit 8d93528e06e9ec0c3c9a243662e7778851e204ad)

I think that SCHEDDGLIDEINS2_ARGS is also ignored and can be removed (also because the name is schedd_glideins2, with "_") but I'd like to consult the condor team first.
I'm confident in the changes already made: these fixed some bugs and made the code working with 8.5 and 8.6.
There are some redundancies that could be improved ( remove some lines that are unused and simplify others) but I prefer to postpone these changes for the next release: to do one change at the time and double check that I'm not missing something.
I wrote some TODO lines in the code and will open a ticket for 3.2.20

#14 Updated by Marco Mambelli over 3 years ago

  • Status changed from Feedback to Resolved

#15 Updated by Marco Mambelli over 3 years ago

I'm attaching to the ticket patched with drop-in replacements containing some condor config files. If you did not customize yours you can simply replace them.
Factory
Frontend_schedd, only if you are using the secondary schedd (normally commented out)

If you use these patches you will have to create by hand the necessary directories if you add schedds. The rest will be OK.

#16 Updated by Parag Mhashilkar over 3 years ago

  • Status changed from Resolved to Closed

Also available in: Atom PDF