glideinwms frontend & factory stopping is slow
Following was reported in the OSG JIRA:
The init.d script for the glideinwms frontend occasionally fails when restart or start is run because the glideinwms frontend is still running. The shutdown of the gwms fe takes some time but the init.d script returns fairly quickly so on occasion when doing a restart or running the service with start soon after running stop, the startup fails with a process already running error: Traceback (most recent call last): File "/usr/sbin/glideinFrontend", line 334, in ? main(sys.argv) File "/usr/sbin/glideinFrontend", line 309, in main pid_obj.register() File "/usr/lib/python2.4/site-packages/glideinwms/lib/pidSupport.py", line 73, in register raise AlreadyRunning, "Another process already running" glideinwms.lib.pidSupport.AlreadyRunning: Another process already running As a fix, I'd suggest putting a sleep 10 or something similar in the stop function so that the glideinwms processes have time to shutdown.
#2 Updated by Marco Mambelli over 5 years ago
The plan is to extend the current wait time in the stop scripts and to add some check and wait in the restart procedure.
stopFrontend and stopFactory are currently inconsistent:
- stopFrontend checks no option and always tries an hard kill at the end
- stopFactory tries an hard kill only if -force is specified
I saw that the 2 scripts behave differently:
- the factory script allows a soft kill followed optionally by a hard kill (force option)
- the frontend version has always the "force" option on.
I did a new branch master_5071
I updated the 2 stop scripts stopFrontend, stopFactory to be more consistent and to avoid sending multiple signals in a loop
and updated the init scripts factory_initd_startup_template, frontend_initd_startup_template to wait up to 30 sec if shutdown is not complete and then print a warning message.
I still want to compare the RPM init scripts so that a single template can be used for both
#4 Updated by Marco Mambelli over 5 years ago
Changes solving all the issues (except single template for startup scripts) committed.
I did a test installing on one node using the git branch master_5071
Preliminary test works correctly.
Monday will review the RPM startup scripts together with Parag, trying to produce them form the templates used for the tarball installation.
Than testing and review
PS Category is wrong. Changes affect Factory and Frontend, both tarball and RPM
#5 Updated by Marco Mambelli over 5 years ago
- File glideinwms.spec glideinwms.spec added
- Status changed from New to Feedback
- Assignee changed from Marco Mambelli to Parag Mhashilkar
The changes address:
I committed the changes tested them in a tarball installation and tested the generation or the RPM version of the init.d files.
Ready for review. Assigning it to Parag
I was not able to commit the spec file which is attached to the ticket.
I sent an email to Tim C. to verify my access to the VDT SVN server.
#7 Updated by Marco Mambelli over 5 years ago
Committed revision 19046 to the OSG SVN repository
- changes to build from the template in master_5071 (or master_5071_5351)
- fixed the factory condor config (now using 00_gwms_factory_general.config that contains "QUEUE_SUPER_USERS = $(QUEUE_SUPER_USERS), wmsfactory")