Improve memory management of factory processes in the v2_7_rc1
- Add a means to limit the number of forks that do check_and_perform_work
- Handle reading from the child's write pipe in a better way to minimize the python zombie processes
- Do better pipe management so that subsequent children do not inherit pipes that its parent created for their earlier siblings
#1 Updated by Burt Holzman over 7 years ago
Some more information for the record:
When testing gWMS 2.7 on glidein-itb, Jeff and company noticed the machine was getting heavily loaded.
On further investigation, memory is getting exhausted, the machine swaps heavily, and thrashes and all is doomed.
The glideFactoryEntryGroup forks one child per entry. These children start dirtying a lot of pages -- it looks like up to 70 MB per child on the ITB factory based on the Pss from /proc/$PID/smaps (some or all of that may be from entry.gflFactoryConfig.rrd_stats.getData, but I can't prove that, yet). The ITB factory has 300 entries, so 300*70 = 21 GB. Ouch.
The children do exit after the parent reads from their pipe, but the parent doesn't check which children have exited but just goes through the pipes one by one doing non-blocking reads, so at worst, N-1 children are waiting and holding onto memory until one finishes.
There are few avenues to pursue, but one is to reduce the number of simultaneous forks