Project

General

Profile

Bug #18748

Fix fork.py behavior (was: reproduce crashes on glidein2.chtc.wisc.edu, provide fix)

Added by Dennis Box almost 2 years ago. Updated almost 2 years ago.

Status:
Closed
Priority:
High
Category:
-
Target version:
Start date:
01/17/2018
Due date:
% Done:

0%

Estimated time:
First Occurred:
Occurs In:
Stakeholders:
Duration:

Description

glideinwms/lib/fork.py was changed in v3.2.20 to use epoll() instead of select() for #17067 .

It was tested heavily on the factory but not well enough on the frontend side of things.

When glidein2.chtc.wisc.edu was upgraded to 3.2.20, it started throwing uncaught exceptions, eventually crashing the frontend. A temporary fix was made to roll back fork.py code to the previous release.

This urgently needs to be reproduced and understood.

As a side note, changes to the rrd files during the upgrade make rolling back to the previous release difficult. If there is a way to read/write the rrd files that doesn't care if new fields are tacked on to the end of the metadata it should be adopted.

fork.py (13 KB) fork.py Marco Mambelli, 01/30/2018 05:28 PM

History

#1 Updated by Marco Mambelli almost 2 years ago

Findings: In 3.2.20 the code was changed to use epoll instead of select to improve scalability, still falling back on select if epoll is not available.
And was also changed to catch specific exceptions instead of the generic “except:"
There was a bug in the code and a function was returning only the first file descriptor instead of the expected list of file descriptors, backing-up on loaded systems and a OSError triggered down the road if caught could have allowed the Frontend to continue to operate but was no more caught.

In the new code I’m taking care of both: fixing the epoll behavior and catching the OSError
I'm also optimizing epoll/poll adding a timeout of 100 milliseconds.

Changes are in v3/18748 and attached to this ticket (new fork.py)

#2 Updated by Marco Mambelli almost 2 years ago

  • Subject changed from reproduce crashes on glidein2.chtc.wisc.edu, provide fix to Fix fork.py behavior (was: reproduce crashes on glidein2.chtc.wisc.edu, provide fix)

#3 Updated by Marco Mambelli almost 2 years ago

To patch you can replace fork.py with the one attached to this ticket.
lib/fork.py in the source tree
glideinwms/lib/fork.py in the python site-packages for an installed RPM (e.g. /usr/lib/python2.6/site-packages/glideinwms/lib/fork.py)

#4 Updated by Dennis Box almost 2 years ago

  • Status changed from Feedback to Resolved
  • Assignee changed from Dennis Box to Marco Mambelli

#5 Updated by Parag Mhashilkar almost 2 years ago

  • Status changed from Resolved to Closed


Also available in: Atom PDF