Project

General

Profile

Bug #3557

Improve memory management of factory processes in the v2_7_rc1

Added by Parag Mhashilkar over 7 years ago. Updated over 7 years ago.

Status:
Closed
Priority:
High
Assignee:
Parag Mhashilkar
Category:
-
Target version:
Start date:
03/01/2013
Due date:
% Done:

0%

Estimated time:
First Occurred:
Occurs In:
Stakeholders:
Duration:

Description

  • Add a means to limit the number of forks that do check_and_perform_work
  • Handle reading from the child's write pipe in a better way to minimize the python zombie processes
  • Do better pipe management so that subsequent children do not inherit pipes that its parent created for their earlier siblings

History

#1 Updated by Burt Holzman over 7 years ago

Some more information for the record:

When testing gWMS 2.7 on glidein-itb, Jeff and company noticed the machine was getting heavily loaded.
On further investigation, memory is getting exhausted, the machine swaps heavily, and thrashes and all is doomed.

The glideFactoryEntryGroup forks one child per entry. These children start dirtying a lot of pages -- it looks like up to 70 MB per child on the ITB factory based on the Pss from /proc/$PID/smaps (some or all of that may be from entry.gflFactoryConfig.rrd_stats.getData, but I can't prove that, yet). The ITB factory has 300 entries, so 300*70 = 21 GB. Ouch.

The children do exit after the parent reads from their pipe, but the parent doesn't check which children have exited but just goes through the pipes one by one doing non-blocking reads, so at worst, N-1 children are waiting and holding onto memory until one finishes.

There are few avenues to pursue, but one is to reduce the number of simultaneous forks

#2 Updated by Parag Mhashilkar over 7 years ago

  • Status changed from Assigned to Feedback
  • Assignee changed from Parag Mhashilkar to Burt Holzman

I implemented all the changes listed here. Burt can you please review them and deploy/test them on the ITB

#3 Updated by Parag Mhashilkar over 7 years ago

All the changes tested and merged back to branch_v2plus.
Changes to master will be taken care off in #2905

#4 Updated by Parag Mhashilkar over 7 years ago

  • Status changed from Feedback to Resolved

#5 Updated by Parag Mhashilkar over 7 years ago

  • Assignee changed from Burt Holzman to Parag Mhashilkar

#6 Updated by Parag Mhashilkar over 7 years ago

  • Status changed from Resolved to Closed

Also available in: Atom PDF