Project

General

Profile

Feature #17824

condor_switchboard is being discontinued, we need a replacement

Added by Marco Mambelli almost 2 years ago. Updated about 1 year ago.

Status:
Closed
Priority:
High
Category:
-
Target version:
Start date:
10/04/2017
Due date:
% Done:

0%

Estimated time:
Stakeholders:
Duration:

Description

The Factory runs as the factory user. To improve security, logs and credentials are stored in dirs owned by the different users then condor users are mapped to.

When privilege separation is in effect, the condor_root_switchboard is used in all Factories to write log files (stdout/err and condor log for glideins) and credentials as the owners are different and permissions problems will occur. Clean out of client log and proxy files use it as well.
http://glideinwms.fnal.gov/doc.prd/factory/monitoring.html#monitoring_logs

- all Factories use privilege separation (all RPM installed for production and testing.) Production ones are probably less than 10
- all credentials supported are saved using switchboard: GSI, SSH keys, files with username/password. And also log files
- jobs are submitted/owned as the actual UID the user (Frontend) is mapped into, so condor operations need the switchboard too

HTCondor plans not to support it anymore and removed the switchboard from 8.7.2

We need something to replace these functionalities.
Avoiding to run the factory as root.

condor_root_switchboard.tar.gz (37.5 KB) condor_root_switchboard.tar.gz Marco Mambelli, 04/25/2018 08:00 AM

History

#1 Updated by Marco Mambelli over 1 year ago

  • Target version changed from v3_2_x to v3_2_22

#2 Updated by Marco Mambelli over 1 year ago

  • Assignee set to Marco Mambelli

#3 Updated by Marco Mambelli over 1 year ago

  • Target version changed from v3_2_22 to v3_2_23

#4 Updated by Marco Mambelli over 1 year ago

HTCodor is working on a solution (an option to isolate schedds so they don't see each other files), in the mean time we need to package and sent the switchboard. Find attached the tarball.

#5 Updated by Marco Mambelli over 1 year ago

  • Priority changed from Normal to High

#6 Updated by Marco Mambelli over 1 year ago

  • Target version changed from v3_2_23 to v3_4_0

#7 Updated by Marco Mambelli over 1 year ago

  • Assignee changed from Marco Mascheroni to Lorena Lobato Pardavila

#8 Updated by Lorena Lobato Pardavila over 1 year ago

  • Status changed from New to Work in progress

#9 Updated by Lorena Lobato Pardavila over 1 year ago

Documentation

Things to keep in mind

  • Have access to the library, to koji and to the SVN(located https://vdt.cs.wisc.edu/svn)
  • After discussing about the package name, we'll pick glideinwms-switchboard. Have to make sure that version and name of the package match for every step
  • BuildArch: x86_64

Workflow:

1. First had to make sure that we have the proxy set for the user to connect
2. Checked everything was correct with osg-koji setup
osg-koji list-permissions --mine
3. Needed to create an OSG directory (here it will be the .spec file) and a upstream with the source.
4. Had to follow this part (and additional documentation): https://opensciencegrid.org/technology/software/koji-workflow/
5. Uploaded my package directory organization (osg directory with the spec file and upstream one with the upstream source) to the SVN (trunk directory)

llobato@fermicloud364:~/trunk/glideinwms-switchboard$ ll
total 128
drwxr-xr-x. 2 llobato cdadmin 2048 May 16 16:26 _build_results
drwxr-xr-x. 2 llobato cdadmin 2048 May 16 16:25 _final_srpm_contents
drwxr-xr-x. 2 llobato cdadmin 2048 May 16 19:37 osg
drwxr-xr-x. 2 llobato cdadmin 2048 May 16 16:10 upstream

6. Create the initial spec
7. Create the developer.tarball.source with
glideinwms-switchboard/1.0.0/condor_root_switchboard.tar.gz

7. Had to untar the package and change the directory names (we had condor_root_switchboard instead of glideinwms-switchboard). Afterwards make the tarball again.
(Testing part)
8. Mocked the package
9. Built it with --scratch
10. Checked the rpm was built in Koji server. Downloaded afterwards.
11. Installed the rpm in the factory test (fermicloud137) and checked the binary file was really installed in /opt/glideinwms-switchboard (as established in the spec). The original condor_root_switchboard suppose to be in /usr/sbin/ so saved this file and moveed the new one to /usr/sbin

[root@host]# yum localinstall <RPM>
root@fermicloud137:/cloud/login/llobato$ yum install glideinwms-switchboard-1.0.0-1.osg34.el7.x86_64.rpm
Loaded plugins: langpacks, priorities
Examining glideinwms-switchboard-1.0.0-1.osg34.el7.x86_64.rpm: glideinwms-switchboard-1.0.0-1.osg34.el7.x86_64
Marking glideinwms-switchboard-1.0.0-1.osg34.el7.x86_64.rpm to be installed
Resolving Dependencies
--> Running transaction check
---> Package glideinwms-switchboard.x86_64 0:1.0.0-1.osg34.el7 will be installed
--> Finished Dependency Resolution

Dependencies Resolved

============================================================================================================================================================================================================
 Package                                         Arch                            Version                                    Repository                                                                 Size
============================================================================================================================================================================================================
Installing:
 glideinwms-switchboard                          x86_64                          1.0.0-1.osg34.el7                          /glideinwms-switchboard-1.0.0-1.osg34.el7.x86_64                           50 k

Transaction Summary
============================================================================================================================================================================================================
Install  1 Package

Total size: 50 k
Installed size: 50 k
Is this ok [y/d/N]: y
Downloading packages:
Running transaction check
Running transaction test
Transaction test succeeded
Running transaction
  Installing : glideinwms-switchboard-1.0.0-1.osg34.el7.x86_64                                                                                                                                          1/1
  Verifying  : glideinwms-switchboard-1.0.0-1.osg34.el7.x86_64                                                                                                                                          1/1

Installed:
  glideinwms-switchboard.x86_64 0:1.0.0-1.osg34.el7

Complete!

root@fermicloud137:/cloud/login/llobato$ ll /opt/
total 12
drwxr-xr-x. 2 root root 4096 May 16 23:06 glideinwms-switchboard
drwxr-xr-x. 7 root root 4096 Nov  1  2017 puppetlabs
drwxr-xr-x. 2 root root 4096 Sep  5  2017 rh
root@fermicloud137:/opt/glideinwms-switchboard$ ll
total 56
-rwxr-xr-x. 1 root root 51048 May 16 17:49 condor_root_switchboard

12. I have submitted jobs but glideins were not running because of:

*: Unexpected Error running '/usr/bin/../sbin/condor_root_switchboard exec 0 9'. Details: Command '/usr/bin/../sbin/condor_root_switchboard exec 0 9' returned non-zero exit status 1: . Stdout:*
[2018-05-17 10:33:20,299] WARNING: glideFactoryLib:639: condor_submit failed (user frontend): Unexpected Error running '/usr/bin/../sbin/condor_root_switchboard exec 0 9'. Details: Command '/usr/bin/../sbin/condor_root_switchboard exec 0 9' returned non-zero exit status 1: . Stdout:
[2018-05-17 10:34:20,274] DEBUG: glideFactoryEntry:1023: Checking security credentials for client fermicloud364-fnal-gov_OSG_gWMSFrontend.main
[2018-05-17 10:34:20,277] DEBUG: glideFactoryLib:632: Submitting 33 glideins
[2018-05-17 10:34:20,311] ERROR: glideFactoryLib:1167: condor_submit failed (user frontend): Unexpected Error running '/usr/bin/../sbin/condor_root_switchboard exec 0 9'. Details: Command '/usr/bin/../sbin/condor_root_switchboard exec 0 9' returned non-zero exit status 1: . Stdout:
[2018-05-17 10:34:20,312] WARNING: glideFactoryLib:639: condor_submit failed (user frontend): Unexpected Error running '/usr/bin/../sbin/condor_root_switchboard exec 0 9'. Details: Command '/usr/bin/../sbin/condor_root_switchboard exec 0 9' returned non-zero exit status 1: . Stdout:

13. This is because as condor_root_switchboard dealt with file and/or directory permissions and the new binary file had not setgid bit enabled . Thus:

root@fermicloud137:/opt/glideinwms-switchboard$ chmod u+s /usr/sbin/condor_root_switchboard
root@fermicloud137:/opt/glideinwms-switchboard$ chmod g+s /usr/sbin/condor_root_switchboard
root@fermicloud137:/opt/glideinwms-switchboard$ ll /usr/sbin/condor_root_switchboard*
*-rwsr-sr-x*. 1 root root 51048 May 17 09:37 /usr/sbin/condor_root_switchboard
-rwsr-sr-x. 1 root root 54040 Mar 15 13:06 /usr/sbin/condor_root_switchboard_original

14. Automatically you could see in the logs how glideins were working

llobato@fermicloud364:/scratch/llobato$ condor_q

-- Schedd: fermicloud364.fnal.gov : <131.225.155.101:9615?... @ 05/17/18 10:46:00
OWNER   BATCH_NAME            SUBMITTED   DONE   RUN    IDLE  TOTAL JOB_IDS
llobato CMD: jobpi_test.py   5/17 10:32      _      _    100    100 97.0-99

100 jobs; 0 completed, 0 removed, 100 idle, 0 running, 0 held, 0 suspended
llobato@fermicloud364:/scratch/llobato$ condor_q

-- Schedd: fermicloud364.fnal.gov : <131.225.155.101:9615?... @ 05/17/18 10:46:52
OWNER   BATCH_NAME            SUBMITTED   DONE   RUN    IDLE  TOTAL JOB_IDS
llobato CMD: jobpi_test.py   5/17 10:32      1      1     98    100 97.1-99

99 jobs; 0 completed, 0 removed, 98 idle, 1 running, 0 held, 0 suspended

In the factory (llobato@fermicloud137:/var/log/gwms-factory/server/entry_ITB_FC_CE2$ tail -100f ITB_FC_CE2.err.log)

[2018-05-17 10:44:20,626] ERROR: glideFactoryLib:1167: condor_submit failed (user frontend): Unexpected Error running '/usr/bin/../sbin/condor_root_switchboard exec 0 9'. Details: Command '/usr/bin/../sbin/condor_root_switchboard exec 0 9' returned non-zero exit status 1: . Stdout:
[2018-05-17 10:44:20,627] WARNING: glideFactoryLib:639: condor_submit failed (user frontend): Unexpected Error running '/usr/bin/../sbin/condor_root_switchboard exec 0 9'. Details: Command '/usr/bin/../sbin/condor_root_switchboard exec 0 9' returned non-zero exit status 1: . Stdout:
[2018-05-17 10:45:20,583] DEBUG: glideFactoryEntry:1023: Checking security credentials for client fermicloud364-fnal-gov_OSG_gWMSFrontend.main
[2018-05-17 10:45:20,587] DEBUG: glideFactoryLib:632: Submitting 38 glideins
[2018-05-17 10:45:20,850] DEBUG: glideFactoryLib:1162: ['Submitting job(s)..........', '10 job(s) submitted to cluster 2404.']
[2018-05-17 10:45:21,189] DEBUG: glideFactoryLib:1162: ['Submitting job(s)..........', '10 job(s) submitted to cluster 2405.']
[2018-05-17 10:45:21,529] DEBUG: glideFactoryLib:1162: ['Submitting job(s)..........', '10 job(s) submitted to cluster 2406.']
[2018-05-17 10:45:21,861] DEBUG: glideFactoryLib:1162: ['Submitting job(s)........', '8 job(s) submitted to cluster 2407.']

15. I have to control this part in the spec file as look for the replacement of the file name.

#10 Updated by Lorena Lobato Pardavila over 1 year ago

For the setgid and setuid bit, I have to use the %attr macro, which has the following format:

%attr(<mode>, <user>, <group>) <file>
  • The <mode> is represented in traditional numeric fashion.
  • The <user> is specified by the login name of the user. Numeric UIDs are not used, for reasons we'll explore in a moment.
  • The <group> is specified by the group's name, as entered in /etc/group. Numeric GIDs are not used, either.
  • <file> represents the file. Shell-style globbing is supported.

So I have specified in the %file section,

%attr(6755, root, root) /usr/sbin/condor_root_switchboard

or

chmod 6755 %{buildroot}/usr/sbin/%{BINARY_FILE}

Also, we have to keep in mind that:

8.7.1 includes the condor_root_switchboard binary.
8.7.2 does NOT include it.

Meaning that to be able to install the package, we have to add

Requires: condor >= 8.7.2

#11 Updated by Lorena Lobato Pardavila over 1 year ago

Tests were done.

1. 8.7.1 includes the condor_root_switchboard binary-> Thus we can assume that if condor version is < 8.7.1, it will have for sure the file installed1

[1]
[root@fermicloud036 llobato]# yum localinstall glideinwms-switchboard-1.0.0-1.osg34.el7.x86_64.rpm
Loaded plugins: langpacks, priorities
Examining glideinwms-switchboard-1.0.0-1.osg34.el7.x86_64.rpm: glideinwms-switchboard-1.0.0-1.osg34.el7.x86_64
Marking glideinwms-switchboard-1.0.0-1.osg34.el7.x86_64.rpm to be installed
Resolving Dependencies
--> Running transaction check
---> Package glideinwms-switchboard.x86_64 0:1.0.0-1.osg34.el7 will be installed
--> Processing Dependency: condor >= 8.7.2 for package: glideinwms-switchboard-1.0.0-1.osg34.el7.x86_64
1305 packages excluded due to repository priority protections
--> Finished Dependency Resolution
Error: Package: glideinwms-switchboard-1.0.0-1.osg34.el7.x86_64 (/glideinwms-switchboard-1.0.0-1.osg34.el7.x86_64)
Requires: condor >= 8.7.2
Installed: condor-8.6.10-1.osg34.el7.x86_64 (@osg-development)
condor = 8.6.10-1.osg34.el7
Available: condor-8.6.3-1.1.osg34.el7.x86_64 (osg)
condor = 8.6.3-1.1.osg34.el7
Available: condor-8.6.4-1.osg34.el7.x86_64 (osg)
condor = 8.6.4-1.osg34.el7
Available: condor-8.6.5-2.osg34.el7.x86_64 (osg)
condor = 8.6.5-2.osg34.el7
Available: condor-8.6.6-1.osg34.el7.x86_64 (osg)
condor = 8.6.6-1.osg34.el7
Available: condor-8.6.8-1.osg34.el7.x86_64 (osg)
condor = 8.6.8-1.osg34.el7
Available: condor-8.6.9-1.1.osg34.el7.x86_64 (osg)
condor = 8.6.9-1.1.osg34.el7
You could try using --skip-broken to work around the problem
You could try running: rpm -Va --nofiles –nodigest

2. 8.7.2 does NOT include it -> Thus, we have to install it. It does, but I double-checked:if the file doesn’t exist, it will installed. If it does, it would complain with the regular rpm message [2]. In any case, I added a rule [3]. This will help to protect local modifications. With this command %config(noreplace), the file will not overwrite an existing file that has been modified.

[2]
root@fermicloud137:/cloud/login/llobato$ yum localinstall glideinwms-switchboard-1.0.0-1.osg34.el7.x86_64.rpm
Loaded plugins: langpacks, priorities
Examining glideinwms-switchboard-1.0.0-1.osg34.el7.x86_64.rpm: glideinwms-switchboard-1.0.0-1.osg34.el7.x86_64
glideinwms-switchboard-1.0.0-1.osg34.el7.x86_64.rpm: does not update installed package.
Nothing to do

[3]
%config(noreplace) /usr/sbin/condor_root_switchboard

Spec file is updated in the SVN.

#12 Updated by Lorena Lobato Pardavila over 1 year ago

  • Status changed from Work in progress to Feedback
  • Assignee changed from Lorena Lobato Pardavila to Marco Mambelli

#13 Updated by Lorena Lobato Pardavila about 1 year ago

  • Status changed from Feedback to Resolved

The package was built: https://koji.chtc.wisc.edu/koji/buildinfo?buildID=11926

Waiting for Brian for being promoted

Some comments related to the package have been added to the Factory rpm installation for OSG documentation: Waiting for merging.

For the record:
For this version it's fine but there is a conditional copy which should be removed in the future. It was set to build directory for the RPM, not the installation file system.

if [ ! -e %{buildroot}%{_sbindir}/%{BINARY_FILE} ]; then
   cp %{BINARY_FILE} "%{buildroot}%{_sbindir}/“
fi

For this version is OK, no need to redo the package. But I will change it in the coming weeks, in case we’ll need to change the content of the package.

#14 Updated by Marco Mambelli about 1 year ago

  • Assignee changed from Marco Mambelli to Lorena Lobato Pardavila

#15 Updated by Marco Mambelli about 1 year ago

  • Status changed from Resolved to Closed


Also available in: Atom PDF