File descriptor limit issues with large number of entries
From: Jeff Dost <firstname.lastname@example.org>
Subject: Factory glideFactoryEntryGroup.py not respecting FD limits
Date: February 5, 2015 at 7:12:16 PM CST
To: "email@example.com" <firstname.lastname@example.org>
Hello glideinWMS support,
I am in the process of trying out a test factory where I have about 600 entries in it. After reconfig, it is no longer able to stay up. Below  is the error I see in the factory log. It looks like the EntryGroup process is maintaining a hard coded maximum of 1024 FDs. Initially I tried manually increasing the hard and soft ulimit levels for the gfactory, but that only worked for the parent glideFactory process:
grep 'open files' /proc/25342/limits
Max open files 4096 4096 files
grep 'open files' /proc/25349/limits
Max open files 1024 1024 files
Poking around the code I found this (line 329 glideFactory.py, version v3_2_7_2):
childs[group] = subprocess.Popen(command_list, shell=False,
and sure enough, the _set_rlimit callback is hard coding to 1024:
resource.setrlimit(resource.RLIMIT_NOFILE, [1024, 1024])
In my opinion there is no good reason for this, it should just inherit whatever limits the glideFactory parent is set to. This way we can tweak the limits as needed on the gfactory user to avoid hitting them.
Can you please take a look?
[2015-02-05 16:06:16,777] WARNING: glideFactory:447: EntryGroup 0 STDERR: Traceback (most recent call last):
File "/usr/sbin/glideFactoryEntryGroup.py", line 729, in ?
File "/usr/sbin/glideFactoryEntryGroup.py", line 672, in main
File "/usr/lib/python2.4/site-packages/glideinwms/factory/glideFactoryEntry.py", line 91, in init
File "/usr/lib/python2.4/site-packages/glideinwms/lib/logSupport.py", line 232, in add_processlog_handler
File "/usr/lib/python2.4/site-packages/glideinwms/lib/logSupport.py", line 76, in init
File "/usr/lib64/python2.4/logging/handlers.py", line 59, in init
File "/usr/lib64/python2.4/logging/__init__.py", line 757, in init
IOError: [Errno 24] Too many open files: u'/var/log/gwms-factory/server/entry_UKI-SOUTHGRID-OX-HEP_t2ce06_longfive/UKI-SOUTHGRID-OX-HEP_t2ce06_longfive.err.log'
[2015-02-05 16:06:16,777] WARNING: glideFactory:454: EntryGroup 0 exited. Checking if it should be restarted.
[2015-02-05 16:06:16,777] WARNING: glideFactory:464: Restarting EntryGroup 0.
[2015-02-05 16:06:17,017] ERROR: glideFactory:701: Exception occurred spawning the factory:
Traceback (most recent call last):
File "/usr/sbin/glideFactory.py", line 697, in main
frontendDescript, entries, restart_attempts, restart_interval)
File "/usr/sbin/glideFactory.py", line 484, in spawn
AttributeError: 'Popen' object has no attribute 'tochild'
#2 Updated by Burt Holzman almost 6 years ago
- Occurs In v3_0, v2_7, v2_7_1, v3_1, v2_7_2, v3_2, v3_2_1, v3_2_2, v3_2_3, v3_2_4, v3_2_5, v3_2_5_1, v3_2_6, v3_2_7, v3_2_8, v3_2_9, v3_3, v3_2_x, v3_x added
Based on further investigation..
The gFEG instantiates glideFactoryEntry for every entry point.
Each gFE sets up logging for itself, opening two (I think) FDs per entry.
This isn't really necessary, since the entry log is only used in the children that are forked from the gFEG.
One solution might be to change gFE.log to a function and init logging on first use.
I think this has been around since v2_7_0 (the EntryGroup refactor) !