Bug #21832
PicklingError: Can't pickle <type 'Boost.Python.enum'>: import of module Boost.Python failed
0%
Description
Krista reported a Frontend crashing with:
PicklingError: Can't pickle <type 'Boost.Python.enum'>: import of module Boost.Python failed
I just reinstalled my production frontend and it’s crashing with the error below. I am using gwms 3.2.21 on an sl7 machine. Am I missing a package or having a version issue? I can see this: Installed Packages boost-python.x86_64 1.53.0-27.el7 @slf-primary Available Packages boost-python.i686 1.53.0-27.el7 slf-primary But any other advice is appreciated. Thanks! Krista [2019-02-04 11:05:57,344] WARNING: fork:55: Forked process '<bound method glideinFrontendElement.get_condor_q of <__main__.glideinFrontendElement instance at 0x7f0f4d1e6440>>' failed [2019-02-04 11:05:57,344] ERROR: fork:56: Forked process '<bound method glideinFrontendElement.get_condor_q of <__main__.glideinFrontendElement instance at 0x7f0f4d1e6440>>' failed Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/glideinwms/lib/fork.py", line 53, in fork_in_bg os.write(w, cPickle.dumps(out)) PicklingError: Can't pickle <type 'Boost.Python.enum'>: import of module Boost.Python failed [2019-02-04 11:06:54,338] ERROR: glideinFrontendElement:251: Unhandled exception, dying: ['Traceback (most recent call last):\n', ' File "/usr/sbin/glideinFrontendElement.py", line 245, in main\n rc = self.iterate()\n', ' File "/usr/sbin/glideinFrontendElement.py", line 275, in iterate\n done_something = self.iterate_one()\n', ' File "/usr/sbin/glideinFrontendElement.py", line 374, in iterate_one\n pipe_out=forkm_obj.fork_and_collect()\n', ' File "/usr/lib/python2.7/site-packages/glideinwms/lib/fork.py", line 281, in fork_and_collect\n results = fetch_fork_result_list(pipe_ids)\n', ' File "/usr/lib/python2.7/site-packages/glideinwms/lib/fork.py", line 120, in fetch_fork_result_list\n pipe_ids[key][\'pid\'])\n', ' File "/usr/lib/python2.7/site-packages/glideinwms/lib/fork.py", line 97, in fetch_fork_result\n out = cPickle.loads(rin)\n', 'EOFError\n'] Traceback (most recent call last): File "/usr/sbin/glideinFrontendElement.py", line 245, in main rc = self.iterate() File "/usr/sbin/glideinFrontendElement.py", line 275, in iterate done_something = self.iterate_one() File "/usr/sbin/glideinFrontendElement.py", line 374, in iterate_one pipe_out=forkm_obj.fork_and_collect() File "/usr/lib/python2.7/site-packages/glideinwms/lib/fork.py", line 281, in fork_and_collect results = fetch_fork_result_list(pipe_ids) File "/usr/lib/python2.7/site-packages/glideinwms/lib/fork.py", line 120, in fetch_fork_result_list pipe_ids[key]['pid']) File "/usr/lib/python2.7/site-packages/glideinwms/lib/fork.py", line 97, in fetch_fork_result out = cPickle.loads(rin) EOFError
This is GWMS 3.2.21
We saw a similar problem in the past but was fixed in 3.2.15, see [#12972]
History
#1 Updated by Marco Mambelli about 2 years ago
The errors here are 2:
1. the pickling error apparently caused by a misconfigured frontend
2. the fork.py error that caused all the frontend to crash instead of just the process querying the frontend
2. Was solved in [#21569], previous tickets w/ partial improvements [#18748]
1. looks similar to [#12972], which should have been solved there
It seems that the root cause was a misconfigured schedd. Removing that solved the problem at CERN (which presented a similar problem).
We should check if either 1 or 2 present themselves with the latest version (3.4.3).
Fix errors if there are (check the logs about 1, because if 2 is fixed, subprocess errors will be only in the log and not crash the Frontend)
If all is OK, confirm that and close this ticket.
#2 Updated by Marco Mambelli about 2 years ago
Should be added also a unit test reproducing the faulty classads coming form the misconfigured schedd.
Check w/ Krista and CERN to reproduce the problem and get the classad
#3 Updated by Marco Mascheroni about 2 years ago
CMS observed a similar issue. It was because of this job:
[mmascher@vocms080 ~]$ condor_q -name login.uscms.org 5634653 -af RequestCPUs error [mmascher@vocms080 ~]$ condor_q -name login.uscms.org 5634653 -format '%s\n' RequestCPUs out.txt [mmascher@vocms080 ~]$ condor_q -name login.uscms.org 5634653 -- Schedd: login.uscms.org : <192.170.227.118:9618?... ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD 5634653.0 emyrclement 2/4 17:08 0+00:00:00 I 0 0.0 connect_wrapper.sh
#4 Updated by Marco Mascheroni over 1 year ago
- First Occurred set to v3_6_1
Test that setting an "unpickable" RequestCPUs does not crash the frontend and then close.
#5 Updated by Marco Mascheroni over 1 year ago
- First Occurred changed from v3_6_1 to Summary
- Target version set to v3_6_1
#6 Updated by Marco Mascheroni over 1 year ago
- Target version changed from v3_6_1 to v3_6_2
#7 Updated by Marco Mascheroni about 1 year ago
- Target version changed from v3_6_2 to v3_6_3
#8 Updated by Marco Mascheroni 10 months ago
- Target version changed from v3_6_3 to v3_6_4
- Priority changed from Urgent to Normal
#9 Updated by Marco Mambelli 6 months ago
- Target version changed from v3_6_4 to v3_6_5
#10 Updated by Marco Mambelli 5 months ago
- Target version changed from v3_6_5 to v3_6_6
#11 Updated by Marco Mambelli 3 months ago
- Target version changed from v3_6_6 to v3_6_7
#12 Updated by Marco Mambelli 20 days ago
- Target version changed from v3_6_7 to v3_7_4