Project

General

Profile

Feature #4586

Switch init script to use RHEL daemon function

Added by Brian Bockelman over 7 years ago. Updated almost 4 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
RPM - Frontend/Factory
Target version:
Start date:
10/20/2016
Due date:
% Done:

100%

Estimated time:
(Total: 0.00 h)
Stakeholders:
Duration:

Description

The init script currently invokes "su" manually. This is considered bad practice on RHEL -- one should stick to the provided functions for launching daemon (which adds additional functionality and allows one to use standard config variables in /etc/sysconfig). At the minimum, "su" should not be used -- "runuser" is appropriate here.

Using the standard scripts also helps insulate the project from possible security concerns. For example, the lockfile is owned by the "frontend" user and "stopFrontend" is run as root; if someone compromises the frontend user account, then they could fiddle with the PID in the lockfile, causing root to send a SIGTERM to any system process (including init).


Subtasks

Feature #14194: write frontend and factory init scripts for sl7ClosedHyunWoo Kim

History

#1 Updated by Parag Mhashilkar over 7 years ago

  • Category set to RPM - Frontend/Factory
  • Status changed from New to Assigned
  • Assignee set to Parag Mhashilkar
  • Target version set to v3_2_x

#2 Updated by Parag Mhashilkar over 5 years ago

  • Assignee changed from Parag Mhashilkar to HyunWoo Kim

#3 Updated by HyunWoo Kim about 4 years ago

  • Status changed from Assigned to Feedback
  • Assignee changed from HyunWoo Kim to Marco Mambelli

In my frontend and factory instance, I have modified /etc/init.d/gwms-frontend and /etc/init.d/gwms-factory
to use daemon function instead of invoke_as_{frontend, factory}.

I tested and it seems to work..

In frontend, I only changed start() and stop() and did not modify others such as reconfig() and upgrade()
i.e. these functions are still using invoke_as_frontend() as they do not launch daemons.
invoke_as_frontend is already using runuser if it exists.

Maybe, I can ask for other developer's feedback(opinion)

I am assigning this to Marco Mambelli for feedback.

#4 Updated by Parag Mhashilkar about 4 years ago

  • Target version changed from v3_2_x to v3_2_16

#5 Updated by HyunWoo Kim about 4 years ago

  • Status changed from Feedback to Assigned
  • Assignee changed from Marco Mambelli to HyunWoo Kim

talked with Parag during gwms weekly meeting,
he suggests to check sl7 to see if daemon function is available there or not.
another suggestion is, I have to take another look at other functions such as reconfig and upgrade and see if we want to use daemon there too.

I am assigning this back to me.

#6 Updated by Brian Bockelman about 4 years ago

For RHEL7 -

Init scripts should not be used at all. You'll need to write a systemd unit files.

Systemd does not allow you to have custom verbs such as "reconfig"; you'll need to have a separate script for this.

#7 Updated by Parag Mhashilkar about 4 years ago

  • Target version changed from v3_2_16 to v3_2_17

#8 Updated by HyunWoo Kim about 4 years ago

I have a theory that I concluded on October 5 2016 Wednesday:

One obvious fact that I also learned is that systemctl reload command is rejected
when the service(gwms-frontend) is not running.
I think we should NOT underestimate this fact, i.e. systemctl assumes that the underlying service
(gwms-frontend) is always running.
But as you all know, our upgrade or reconfig fist kills the gwms-frontend service.
So, while systemctl reload command is running, our upgrade runs and kills the gwms-frontend service
and I believe this makes the systemctl reload command think that something wrong has happened
and then systemctl attempts to stop the service which does not exist in our case
because upgrade already has killed it already at this point..

In order to test this theory of mine, I did some experiment:

Write the following file as /usr/sbin/hkbin

#!/usr/bin/env python                                                                                                            

import sys
import subprocess
import time

if sys.argv[1] == 'start':
   print 'regular start is being called'
   sys.stdout.flush()
   subprocess.call( 'nohup /usr/sbin/hksleep < /dev/null > /tmp/tmp.out 2>/tmp/tmp.err & ', shell=True )

elif sys.argv[1] == 'stop':
   print 'regular stop is being called'
   sys.stdout.flush()

   subprocess.call( 'killall python /usr/sbin/hksleep', shell=True )

elif sys.argv[1] == 'reload':
   print 'reload stop is being called'
   sys.stdout.flush()
   subprocess.call( 'killall python /usr/sbin/hksleep', shell=True )

   print 'pretend that reload is doing something'
   sys.stdout.flush()

   print 'reload start is being called'
   sys.stdout.flush()
   subprocess.call( 'nohup /usr/sbin/hksleep < /dev/null > /tmp/tmp.out 2>/tmp/tmp.err & ', shell=True )

And also write /etc/systemd/system/hktest.service
and then do

systemctl start hktest.service
systemctl reload hktest.service

then you will see the following messages from /var/log/messages

Oct  5 16:35:23 fermicloud363 systemd: Starting HKTESTSL7...
Oct  5 16:35:23 fermicloud363 hkbin: regular start is being called
Oct  5 16:35:23 fermicloud363 systemd: Started HKTESTSL7.

Oct  5 16:35:30 fermicloud363 hkbin: reload stop is being called
Oct  5 16:35:30 fermicloud363 hkbin: /usr/sbin/hksleep: no process found
Oct  5 16:35:30 fermicloud363 hkbin: regular stop is being called   **
Oct  5 16:35:30 fermicloud363 hkbin: /usr/sbin/hksleep: no process found
Oct  5 16:35:30 fermicloud363 systemd: Reloaded HKTESTSL7.

You will see this ** line which is the evidence that supports my theory..

So, as far as our upgrade() kills the gwms service, systemctl will intervene..

So, if this assertion is convincing enough for you two,
we should simply guide people to use directly /usr/sbin/gwms-frontend upgrade or reconfig
and tell them that systemctl reload gwms-frontend.service is not supported..

#9 Updated by HyunWoo Kim about 4 years ago

During today's gwms meeting, we decided to break this ticket into 2 separate tickets,
this one(4586) dedicated for SL7, and a new ticket for SL6 for which I already found and tested a solution using daemons function.

#10 Updated by Brian Bockelman about 4 years ago

The way you would do this in RHEL7 is send a signal to the frontend, have it 'exec' to the appropriate reconfig command, then have the reconfig 'exec' the frontend again.

That said: there is no support for custom verbs (reconfig, upgrade, etc) in the systemd model. In general, you want to do it in a standalone command as you outline above.

#11 Updated by HyunWoo Kim about 4 years ago

  • Status changed from Assigned to Feedback
  • Assignee changed from HyunWoo Kim to Marco Mambelli

we are going to split this ticket into 2 separate tickets:
my work on sl6 init scripts will remain in this ticket and my work (incomplete) on sl7 init scripts will go to the new ticket.

I am summarizing my work on sl6 here and assign to Marco Mambelli for feedback.

These are the changes to frontend and factory init scripts:

In summary, I have been using daemon for both facotry and frontend and for both start and stop successfully..

In Factory:

. /etc/rc.d/init.d/functions

Start:
#    invoke_as_factory "nice -2 \"${FACTORY_START}\" \"$factory_dir\" 2>/dev/null 1>&2 </dev/null &" 
    daemon --user $FACTORY_USER --pidfile="/tmp/dummy.pid" -2 "${FACTORY_START} $factory_dir 2>/dev/null 1>&2 </dev/null &" 

Stop:
#    "$FACTORY_STOP" -f "$factory_dir" 2>/dev/null 1>&2 </dev/null
    daemon --user $FACTORY_USER --force "${FACTORY_STOP} -f $factory_dir   2>/dev/null 1>&2 </dev/null" 

In Frontend:

. /etc/rc.d/init.d/functions

Start:
#    invoke_as_frontend    "nice -2   \"${FRONTEND_START}\" \"$frontend_dir\" 2>$LOG_FILE_STARTUP 1>&2 </dev/null &" 
daemon --user $FRONTEND_USER --pidfile="/tmp/dummy.pid" -2 "${FRONTEND_START}     $frontend_dir   2>$LOG_FILE_STARTUP 1>&2 </dev/null &" 

Stop:
#    invoke_as_frontend                     "\"${FRONTEND_STOP}\" -f  \"$frontend_dir\" 2>/dev/null 1>&2 </dev/null" 
      daemon --user $FRONTEND_USER --force    "${FRONTEND_STOP}   -f    $frontend_dir   2>/dev/null 1>&2 </dev/null" 

Now question is, what about reconfig() and upgrade()?

I tested as follows;
instead of the original line in frontend reconfig()

    invoke_as_frontend "\"${CREATION_DIR}reconfig_frontend\" -force_name \"$frontend_name\" -writeback $writeback -xml \"$cfg_loc\" -update_def_cfg \"$update_def_cfg\" $fix_rrd" 

this new line
        daemon --user $FRONTEND_USER "${CREATION_DIR}reconfig_frontend -force_name $frontend_name -writeback $writeback -xml $cfg_loc -update_def_cfg $update_def_cfg $fix_rrd"     

seems to work as well..

but i am wondering, do we need to use daemon for reconfig and upgrade too??

why don't we use daemon for only start and stop (and thus restart) for now?

#12 Updated by Marco Mambelli about 4 years ago

  • Assignee changed from Marco Mambelli to HyunWoo Kim

I would remove the —pidfile option. It is confusing with the bogus file.

Checking the code I saw that, if the pidfile is not specified, it is looking for /var/run/$base.pid (/var/run/glideinwms-frontend[factory].pid)

If there is some valid pid in the file and force is not defined the script is quitting (because it thinks another copy is still running).
Otherwise it continues.
daemon is not writing the pid files so there should be no pid files
If the script is complaining probably there is some problem to fix

Then for the shutdown (stop) we should not use daemon, it is just shutting down, not running continuously.

#13 Updated by HyunWoo Kim about 4 years ago

Based on Marco's comments I modified as follows;

1. in both Frontend and Factory, daemon is given now the following values for --pidfile option
- $frontend_dir/lock/frontend.lock
- $factory_dir/lock/glideinWMS.lock

2. reverted the use of daemon function for stop
and in case of Factory stop, now we are using invoke_as_factory which will ensure to use runuser

I tested these and if no objection, I will merge to branch_v3_2

#14 Updated by HyunWoo Kim about 4 years ago

  • Status changed from Feedback to Resolved

merged into branch_v3_2

#15 Updated by Parag Mhashilkar almost 4 years ago

  • Status changed from Resolved to Assigned

Using daemon function directly to start and stop the frontend does not work in case of tarball install. Typically in this case, services are not started with root privileges but directly as a local user. Please fix.

#16 Updated by HyunWoo Kim almost 4 years ago

I pondered over this comment today.
I have never tried "tarball install" and thus I should try a "tarball install" myself in order to get a picture
of how to proceed with this issue,
but my current estimation is that I can maybe update start() to look like

if [ "$RPM_INSTALL" = "True" ]; then
   daemon --user $FRONTEND_USER --pidfile="$frontend_dir/lock/frontend.lock" -2 "${FRONTEND_START} $frontend_dir 2>$LOG_FILE_STARTUP 1>&2 </dev/null &" 
else
    invoke_as_frontend "nice -2 \"${FRONTEND_START}\" \"$frontend_dir\" 2>$LOG_FILE_STARTUP 1>&2 </dev/null &" 
fi

This conjecture is based on /var/lib/gwms-frontend/vofrontend/frontend_startup
in which start() has
invoke_as_frontend "nice -2 \"${FRONTEND_START}\" \"$frontend_dir\" 2>$LOG_FILE_STARTUP 1>&2 </dev/null &"
and my understanding is that this file
/var/lib/gwms-frontend/vofrontend/frontend_startup
can be run by the frontend user

Am I correct?

P.S. we don't have to worry about stop() because I did not update stop() to use daemon function.

#17 Updated by Parag Mhashilkar almost 4 years ago

yea i had to patch my setup with what you are proposing and it works. You also need to change the factory startup template.

#18 Updated by HyunWoo Kim almost 4 years ago

  • Status changed from Assigned to Feedback

I updated v3/4586 branch with these new additions.
Parag, you have to review these new changes, right?

#19 Updated by HyunWoo Kim almost 4 years ago

  • Status changed from Feedback to Resolved

I merged this into branch_v3_2 on Dec 15.

#20 Updated by Parag Mhashilkar almost 4 years ago

  • Status changed from Resolved to Closed

Also available in: Atom PDF