Project

General

Profile

Bug #6300

Starting/stopping return codes not conforming to Linux standards

Added by Parag Mhashilkar over 5 years ago. Updated over 4 years ago.

Status:
Closed
Priority:
Normal
Category:
-
Target version:
Start date:
05/19/2014
Due date:
% Done:

0%

Estimated time:
First Occurred:
Occurs In:
Stakeholders:
Duration:

Description

This should be fixed in templates and or underlying python code.

On May 19, 2014, at 1:15 PM, Gerard Bernabeu wrote:

This is actually a but in the init script, extracted from http://refspecs.linuxbase.org/LSB_3.1.1/LSB-Core-generic/LSB-Core-generic/iniscrptact.html:

the init script shall return an exit status of zero if the action was successful. Otherwise, the exit status shall be non-zero, as defined below. In addition to straightforward success, the following situations are also to be considered successful:
    • restarting a service (instead of reloading it) with the force-reload argument

    • running start on a service already running

    • running stop on a service already stopped or not running

    • running restart on a service already stopped or not running

    • running try-restart on a service already stopped or not running

So, if starting an started service, return code should be 0 as the service is running.

Gerard

On Mon, May 19, 2014 at 1:11 PM, Parag A Mhashilkar <parag@fnal.gov> wrote:
Looks like you are doing a reconfig (on a running frontend) and then start
Reconfig started the frontend already so the error message is correct. you are trying to start the frontend process twice and will fail.

=====
notice: /Stage[main]/Glideinwms::Vofrontend_standalone_install_config/Exec[frontend-reconfig]/returns: ...Work files are in /var/lib/gwms-frontend/vofrontend
notice: /Stage[main]/Glideinwms::Vofrontend_standalone_inst[OK]config/Exec[frontend-reconfig]/returns: Reconfiguring the frontend
notice: /Stage[main]/Glideinwms::Vofrontend_standalone_inst[FAILED]ig/Exec[frontend-reconfig]/returns: Starting glideinWMS frontend fermicloud357_OSG_gWMSFrontend:
err: /Stage[main]/Glideinwms::Vofrontend_standalone_install_config/Exec[frontend-reconfig]: Failed to call refresh: service gwms-frontend reconfig returned 1 instead of one of [0] at /etc/puppetgcso/environments/neha_glideinwms_preprod/modules/glideinwms/manifests/vofrontend_standalone_install_config.pp:83
debug: Service[gwms-frontend](provider=redhat): Executing '/sbin/service gwms-frontend status'
debug: Puppet::Type::Service::ProviderRedhat: Executing '/sbin/chkconfig gwms-frontend'
debug: Service[gwms-frontend](provider=redhat): Executing '/sbin/service gwms-frontend start'
err: /Stage[main]/Glideinwms::Vofrontend_standalone_install_config/Service[gwms-frontend]/ensure: change from stopped to running failed: Could not start Service[gwms-frontend]: Execution of '/sbin/service gwms-frontend start' returned 1:  at /etc/puppetgcso/environments/neha_glideinwms_preprod/modules/glideinwms/manifests/vofrontend_standalone_install_config.pp:97
====

Thanks & Regards
+==========================================================
| Parag Mhashilkar
| Fermi National Accelerator Laboratory, MS 120
| Wilson & Kirk Road, Batavia, IL - 60510
|----------------------------------------------------------
| Phone: 1 (630) 840-6530 Fax: 1 (630) 840-2783
|----------------------------------------------------------
| Wilson Hall, 806E (Nov 8, 2012 - To date)
| Wilson Hall, 867E (Nov 17, 2010 - Nov 7, 2012)
| Wilson Hall, 863E (Apr 24, 2007 - Nov 16, 2010)
| Wilson Hall, 856E (Mar 21, 2005 - Apr 23, 2007)
+==========================================================

On May 16, 2014, at 5:40 PM, Neha Sharma wrote:

> Hi Parag
>
> I am seeing another issue -
>
> 1. First puppet run does frontend upgrade/reconfig
> 2. Second puppet run does frontend reconfig
>
> 2. above fails to start the frontend
>
> Error being -
>
> Traceback (most recent call last):
>  File "/usr/sbin/glideinFrontend", line 335, in <module>
>    main(sys.argv[1])
>  File "/usr/sbin/glideinFrontend", line 310, in main
>    pid_obj.register()
>  File "/usr/lib/python2.6/site-packages/glideinwms/lib/pidSupport.py", line 73, in register
>    raise AlreadyRunning, "Another process already running" 
> glideinwms.lib.pidSupport.AlreadyRunning: Another process already running
>
> Puppet output
>
> otice: /Stage[main]/Glideinwms::Vofrontend_standalone_inst[OK]config/Exec[frontend-reconfig]/returns: Shutting down glideinWMS frontend fermicloud357_OSG_gWMSFrontend:
> notice: /Stage[main]/Glideinwms::Vofrontend_standalone_install_config/Exec[frontend-reconfig]/returns: ~/vofrontend /
> notice: /Stage[main]/Glideinwms::Vofrontend_standalone_install_config/Exec[frontend-reconfig]/returns: ...Saved the current config file into the working dir
> notice: /Stage[main]/Glideinwms::Vofrontend_standalone_install_config/Exec[frontend-reconfig]/returns: ...Saved the backup config file into the working dir
> notice: /Stage[main]/Glideinwms::Vofrontend_standalone_install_config/Exec[frontend-reconfig]/returns: ...Reconfigured frontend 'fermicloud357_OSG_gWMSFrontend'
> notice: /Stage[main]/Glideinwms::Vofrontend_standalone_install_config/Exec[frontend-reconfig]/returns: ...Active groups are:
> notice: /Stage[main]/Glideinwms::Vofrontend_standalone_install_config/Exec[frontend-reconfig]/returns:      FNAL_lbne
> notice: /Stage[main]/Glideinwms::Vofrontend_standalone_install_config/Exec[frontend-reconfig]/returns:      FNAL_minerva
> notice: /Stage[main]/Glideinwms::Vofrontend_standalone_install_config/Exec[frontend-reconfig]/returns:      FNAL_minos
> notice: /Stage[main]/Glideinwms::Vofrontend_standalone_install_config/Exec[frontend-reconfig]/returns:      FNAL_mu2e
> notice: /Stage[main]/Glideinwms::Vofrontend_standalone_install_config/Exec[frontend-reconfig]/returns:      FNAL_nova
> notice: /Stage[main]/Glideinwms::Vofrontend_standalone_install_config/Exec[frontend-reconfig]/returns:      FNAL_uboone
> notice: /Stage[main]/Glideinwms::Vofrontend_standalone_install_config/Exec[frontend-reconfig]/returns:      OSG_lbne
> notice: /Stage[main]/Glideinwms::Vofrontend_standalone_install_config/Exec[frontend-reconfig]/returns:      OSG_minerva
> notice: /Stage[main]/Glideinwms::Vofrontend_standalone_install_config/Exec[frontend-reconfig]/returns:      OSG_minos
> notice: /Stage[main]/Glideinwms::Vofrontend_standalone_install_config/Exec[frontend-reconfig]/returns:      OSG_mu2e
> notice: /Stage[main]/Glideinwms::Vofrontend_standalone_install_config/Exec[frontend-reconfig]/returns:      OSG_nova
> notice: /Stage[main]/Glideinwms::Vofrontend_standalone_install_config/Exec[frontend-reconfig]/returns:      OSG_uboone
> notice: /Stage[main]/Glideinwms::Vofrontend_standalone_install_config/Exec[frontend-reconfig]/returns:      fermicloud
> notice: /Stage[main]/Glideinwms::Vofrontend_standalone_install_config/Exec[frontend-reconfig]/returns:      paid_cloud
> notice: /Stage[main]/Glideinwms::Vofrontend_standalone_install_config/Exec[frontend-reconfig]/returns: ...Verifying rrd schema
> notice: /Stage[main]/Glideinwms::Vofrontend_standalone_install_config/Exec[frontend-reconfig]/returns: WARNING: monitor/group_main/total/Status_Attributes.rrd missing, will be created on restart
> notice: /Stage[main]/Glideinwms::Vofrontend_standalone_install_config/Exec[frontend-reconfig]/returns: ...Work files are in /var/lib/gwms-frontend/vofrontend
> notice: /Stage[main]/Glideinwms::Vofrontend_standalone_inst[OK]config/Exec[frontend-reconfig]/returns: Reconfiguring the frontend
> notice: /Stage[main]/Glideinwms::Vofrontend_standalone_inst[FAILED]ig/Exec[frontend-reconfig]/returns: Starting glideinWMS frontend fermicloud357_OSG_gWMSFrontend:
> err: /Stage[main]/Glideinwms::Vofrontend_standalone_install_config/Exec[frontend-reconfig]: Failed to call refresh: service gwms-frontend reconfig returned 1 instead of one of [0] at /etc/puppetgcso/environments/neha_glideinwms_preprod/modules/glideinwms/manifests/vofrontend_standalone_install_config.pp:83
> debug: Service[gwms-frontend](provider=redhat): Executing '/sbin/service gwms-frontend status'
> debug: Puppet::Type::Service::ProviderRedhat: Executing '/sbin/chkconfig gwms-frontend'
> debug: Service[gwms-frontend](provider=redhat): Executing '/sbin/service gwms-frontend start'
> err: /Stage[main]/Glideinwms::Vofrontend_standalone_install_config/Service[gwms-frontend]/ensure: change from stopped to running failed: Could not start Service[gwms-frontend]: Execution of '/sbin/service gwms-frontend start' returned 1:  at /etc/puppetgcso/environments/neha_glideinwms_preprod/modules/glideinwms/manifests/vofrontend_standalone_install_config.pp:97
> debug: Service[condor](provider=redhat): Executing '/sbin/service condor status'
> debug: Puppet::Type::Service::ProviderRedhat: Executing '/sbin/chkconfig condor'
> debug: file_metadata supports formats: b64_zlib_yaml pson raw yaml; using pson
> debug: Exec[genuserproxy](provider=posix): Executing check '/bin/ls /var/lib/gwms-frontend-proxies/pilot.*.proxy'
> debug: Executing '/bin/ls /var/lib/gwms-frontend-proxies/pilot.*.proxy'
> debug: /Stage[main]/Glideinwms::Vofrontend_standalone_install_config/Exec[genuserproxy]/unless: /var/lib/gwms-frontend-proxies/pilot.argoneut.proxy
> debug: /Stage[main]/Glideinwms::Vofrontend_standalone_install_config/Exec[genuserproxy]/unless: /var/lib/gwms-frontend-proxies/pilot.cdf.proxy
> debug: /Stage[main]/Glideinwms::Vofrontend_standalone_install_config/Exec[genuserproxy]/unless: /var/lib/gwms-frontend-proxies/pilot.coupp.proxy
> debug: /Stage[main]/Glideinwms::Vofrontend_standalone_install_config/Exec[genuserproxy]/unless: /var/lib/gwms-frontend-proxies/pilot.darkside.proxy
> debug: /Stage[main]/Glideinwms::Vofrontend_standalone_install_config/Exec[genuserproxy]/unless: /var/lib/gwms-frontend-proxies/pilot.dzero.proxy
> debug: /Stage[main]/Glideinwms::Vofrontend_standalone_ins
>
> - Neha

History

#1 Updated by Marco Mambelli over 5 years ago

  • Target version changed from v3_2_x to v3_3

The request is to change the exit codes returned by the init script to comply with LSB recommendations.
Below is the current status.

This changes the behavior of service scripts (used by users), so should go in release 3.3

Current behavior frontend/factory:
start:
- already running: 0/0
- started successfully: 0/0
- unable to start (not running): 1/1
reason of failure is printed as message

stop:
- already stopped: 0/0
- stopped successfully: 0/0
- unable to stop (still running): 1/1
stopping all the processes takes some time. the command waits for 30sec then exits with failure. Processes may end after the timeout so result stopped even if the command failed

restart (it's a stop+start):
- stopped and started: 0/0
- started a stopped service: 0/0
- failed to stop (if initially running): 1/1
- failed to start: 1/1

Unimplemented command returns error, exit code 3
Insufficient privileges returns error, exit code 4

The behaviors above correspond to the LSB specification for success/failure but the exit code is always 1 in case of failure (mostly not distinguishing the different causes of error).

force-reload is a mandatory command and is not implemented

reconfig and upgrade are similar to the reload but are different: they change some files in the code and the configuration file.

reconfig and upgrade in frontend/factory are returning:
- stopped, reconfigured and started: 0/0
- reconfigured a stopped service: 0/0
- failed to stop (if initially running): 1/1
- failed to reconfigure (even if started with old configuration): 1*/1
- failed to start (after stopping and reconfiguring, even if reconfigure failed): 1/1
  • There was a bug in the frontend behavior not detecting the failed to configure (only an XML error was printed). This has been fixed in this branch.

The new behavior is similar except:

If not installed (start or check executables missing) fail exit code: 5 (before was 1 because invocation failed)

reconfig and upgrade in frontend/factory are now returning:
- failed to reconfigure (even if started with old configuration): 6/6
- failed to start (after stopping and reconfiguring, even if reconfigure failed): 1/1
To comply with LSB suggestions (1 was OK but 6 is better)

Need to check (w/ Neha and Gerard) to what extent reconfig/upgrade need to be changed to satisfy the request.
Marco

#2 Updated by Marco Mambelli over 5 years ago

We should discuss on Wednesday, this could go also in 3_2_8, usually checks are 0/!0
But it is still a change in behavior so I don't know.

I leave further changes for after a discussion with operators:
Krista, Jeff, Neha, Gerard

#3 Updated by Burt Holzman over 5 years ago

  • Target version changed from v3_3 to v3_2_8

#4 Updated by Marco Mambelli over 5 years ago

  • Status changed from New to Feedback
  • Assignee changed from Marco Mambelli to Parag Mhashilkar

I talked with Neha. It seems that most of the problems were due to the use of an old version (3.2.3) the changes already in 3.2.6 should fix the problem highlighted and the added LSB compliance with the fixes in this ticket should be sufficient.

Moving this ticket to feedback. Code changes are in v3/6300.

Marco

PS Quoting an email form Neha {quote}
Hi Marco

In version 3.2.3-1, post frontend config change, we have always had to run puppet twice ( for the frontend process to start up post reconfig)

This is because not all processes had exited by the time 'start' was invoked. We requested a sleep be put in between stop/start commands

Looking at my work notes in RITM0117700, single puppet run was able to bring up frontend ok post reconfig.

So this seems to have been fixed in version 3.2.6-1

I'll be doing the production frontend upgrade this thursday and then can confirm for sure if it works ok for us.

- Neha {quote}

#5 Updated by Parag Mhashilkar over 5 years ago

  • Assignee changed from Parag Mhashilkar to Marco Mambelli

Sent feedback separately.

#6 Updated by Marco Mambelli about 5 years ago

  • Status changed from Feedback to Resolved

I added some comments as per Parag's feedback and merged to branch_v3_2 and master

#7 Updated by Parag Mhashilkar about 5 years ago

  • Status changed from Resolved to Closed

#8 Updated by Marco Mambelli over 4 years ago

  • Subject changed from Starting/stopping return codes not confirming to Linux standards to Starting/stopping return codes not conforming to Linux standards

fixed the subject



Also available in: Atom PDF