Project

General

Profile

Bug #5071

glideinwms frontend & factory stopping is slow

Added by Parag Mhashilkar almost 6 years ago. Updated over 5 years ago.

Status:
Closed
Priority:
Normal
Category:
RPM - Frontend/Factory
Target version:
Start date:
02/05/2014
Due date:
% Done:

100%

Estimated time:
(Total: 0.00 h)
First Occurred:
Occurs In:
Stakeholders:
Duration:

Description

Following was reported in the OSG JIRA:
https://jira.opensciencegrid.org/browse/SOFTWARE-1328

The init.d script for the glideinwms frontend occasionally fails when restart or start is run because the glideinwms frontend is still running. The shutdown of the gwms fe takes some time but the init.d script returns fairly quickly so on occasion when doing a restart or running the service with start soon after running stop, the startup fails with a process already running error:
Traceback (most recent call last):
File "/usr/sbin/glideinFrontend", line 334, in ?
main(sys.argv[1])
File "/usr/sbin/glideinFrontend", line 309, in main
pid_obj.register()
File "/usr/lib/python2.4/site-packages/glideinwms/lib/pidSupport.py", line 73, in register
raise AlreadyRunning, "Another process already running" 
glideinwms.lib.pidSupport.AlreadyRunning: Another process already running
As a fix, I'd suggest putting a sleep 10 or something similar in the stop function so that the glideinwms processes have time to shutdown.

glideinwms.spec (41.1 KB) glideinwms.spec Marco Mambelli, 03/24/2014 01:29 PM

Subtasks

Bug #5350: Problems invoking startup scripts as the wrong userClosedMarco Mambelli

History

#1 Updated by Igor Sfiligoi almost 6 years ago

It can occasionally take minutes for some of the sub-processes to terminate.

BTW: This is not specific to the init.d script... same experience with the frontend_startup script.

#2 Updated by Marco Mambelli over 5 years ago

The plan is to extend the current wait time in the stop scripts and to add some check and wait in the restart procedure.

stopFrontend and stopFactory are currently inconsistent:
- stopFrontend checks no option and always tries an hard kill at the end
- stopFactory tries an hard kill only if -force is specified

I saw that the 2 scripts behave differently:
- the factory script allows a soft kill followed optionally by a hard kill (force option)
- the frontend version has always the "force" option on.

I did a new branch master_5071

I updated the 2 stop scripts stopFrontend, stopFactory to be more consistent and to avoid sending multiple signals in a loop
and updated the init scripts factory_initd_startup_template, frontend_initd_startup_template to wait up to 30 sec if shutdown is not complete and then print a warning message.

I still want to compare the RPM init scripts so that a single template can be used for both

#3 Updated by Parag Mhashilkar over 5 years ago

  • Subject changed from glideinwms frontend init.d script occasionally fails to glideinwms frontend & factory stopping is slow

#4 Updated by Marco Mambelli over 5 years ago

Changes solving all the issues (except single template for startup scripts) committed.
I did a test installing on one node using the git branch master_5071
Preliminary test works correctly.

Monday will review the RPM startup scripts together with Parag, trying to produce them form the templates used for the tarball installation.

Than testing and review

PS Category is wrong. Changes affect Factory and Frontend, both tarball and RPM

#5 Updated by Marco Mambelli over 5 years ago

The changes address:
- 5071
- 5106
- 5148
- 5350

I committed the changes tested them in a tarball installation and tested the generation or the RPM version of the init.d files.
Ready for review. Assigning it to Parag

I was not able to commit the spec file which is attached to the ticket.
I sent an email to Tim C. to verify my access to the VDT SVN server.

#6 Updated by Marco Mambelli over 5 years ago

I committed the new spec file (the file attached) to the OSG SVN repository

#7 Updated by Marco Mambelli over 5 years ago

Committed revision 19046 to the OSG SVN repository

- changes to build from the template in master_5071 (or master_5071_5351)
- fixed the factory condor config (now using 00_gwms_factory_general.config that contains "QUEUE_SUPER_USERS = $(QUEUE_SUPER_USERS), wmsfactory")

#8 Updated by Parag Mhashilkar over 5 years ago

  • Assignee changed from Parag Mhashilkar to Marco Mambelli

Created a new branch v3/5071_v2 thats rebased against branch_v3_2. Reviewed and sent feedback to Marco.

#9 Updated by Marco Mambelli over 5 years ago

  • Status changed from Feedback to Resolved

See 5071 for details. Merged in branch_v3_2, ready for v3_2_4rc1 release
Leaving the tcket Resolved and not closed until the release.

#10 Updated by Parag Mhashilkar over 5 years ago

Closing the issues that were take care of in v3.2.4

#11 Updated by Parag Mhashilkar over 5 years ago

  • Status changed from Resolved to Closed


Also available in: Atom PDF