Project

General

Profile

Bug #21832

PicklingError: Can't pickle <type 'Boost.Python.enum'>: import of module Boost.Python failed

Added by Marco Mambelli 9 months ago. Updated 13 days ago.

Status:
New
Priority:
Urgent
Category:
-
Target version:
Start date:
02/04/2019
Due date:
% Done:

0%

Estimated time:
First Occurred:
Occurs In:
Stakeholders:
Duration:

Description

Krista reported a Frontend crashing with:
PicklingError: Can't pickle <type 'Boost.Python.enum'>: import of module Boost.Python failed

I just reinstalled my production frontend and it’s crashing with the error below.  I am using gwms 3.2.21 on an sl7 machine.  Am I missing a package or having a version issue?  I can see this:
Installed Packages
boost-python.x86_64                                                                          1.53.0-27.el7                                      @slf-primary
Available Packages
boost-python.i686                                                                            1.53.0-27.el7                                      slf-primary 

But any other advice is appreciated.  Thanks!
Krista

[2019-02-04 11:05:57,344] WARNING: fork:55: Forked process '<bound method glideinFrontendElement.get_condor_q of <__main__.glideinFrontendElement instance at 0x7f0f4d1e6440>>' failed
[2019-02-04 11:05:57,344] ERROR: fork:56: Forked process '<bound method glideinFrontendElement.get_condor_q of <__main__.glideinFrontendElement instance at 0x7f0f4d1e6440>>' failed
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/glideinwms/lib/fork.py", line 53, in fork_in_bg
    os.write(w, cPickle.dumps(out))
PicklingError: Can't pickle <type 'Boost.Python.enum'>: import of module Boost.Python failed
[2019-02-04 11:06:54,338] ERROR: glideinFrontendElement:251: Unhandled exception, dying: ['Traceback (most recent call last):\n', '  File "/usr/sbin/glideinFrontendElement.py", line 245, in main\n    rc = self.iterate()\n', '  File "/usr/sbin/glideinFrontendElement.py", line 275, in iterate\n    done_something = self.iterate_one()\n', '  File "/usr/sbin/glideinFrontendElement.py", line 374, in iterate_one\n    pipe_out=forkm_obj.fork_and_collect()\n', '  File "/usr/lib/python2.7/site-packages/glideinwms/lib/fork.py", line 281, in fork_and_collect\n    results = fetch_fork_result_list(pipe_ids)\n', '  File "/usr/lib/python2.7/site-packages/glideinwms/lib/fork.py", line 120, in fetch_fork_result_list\n    pipe_ids[key][\'pid\'])\n', '  File "/usr/lib/python2.7/site-packages/glideinwms/lib/fork.py", line 97, in fetch_fork_result\n    out = cPickle.loads(rin)\n', 'EOFError\n']
Traceback (most recent call last):
  File "/usr/sbin/glideinFrontendElement.py", line 245, in main
    rc = self.iterate()
  File "/usr/sbin/glideinFrontendElement.py", line 275, in iterate
    done_something = self.iterate_one()
  File "/usr/sbin/glideinFrontendElement.py", line 374, in iterate_one
    pipe_out=forkm_obj.fork_and_collect()
  File "/usr/lib/python2.7/site-packages/glideinwms/lib/fork.py", line 281, in fork_and_collect
    results = fetch_fork_result_list(pipe_ids)
  File "/usr/lib/python2.7/site-packages/glideinwms/lib/fork.py", line 120, in fetch_fork_result_list
    pipe_ids[key]['pid'])
  File "/usr/lib/python2.7/site-packages/glideinwms/lib/fork.py", line 97, in fetch_fork_result
    out = cPickle.loads(rin)
EOFError

This is GWMS 3.2.21
We saw a similar problem in the past but was fixed in 3.2.15, see [#12972]

History

#1 Updated by Marco Mambelli 9 months ago

The errors here are 2:
1. the pickling error apparently caused by a misconfigured frontend
2. the fork.py error that caused all the frontend to crash instead of just the process querying the frontend

2. Was solved in [#21569], previous tickets w/ partial improvements [#18748]
1. looks similar to [#12972], which should have been solved there

It seems that the root cause was a misconfigured schedd. Removing that solved the problem at CERN (which presented a similar problem).

We should check if either 1 or 2 present themselves with the latest version (3.4.3).

Fix errors if there are (check the logs about 1, because if 2 is fixed, subprocess errors will be only in the log and not crash the Frontend)
If all is OK, confirm that and close this ticket.

#2 Updated by Marco Mambelli 9 months ago

Should be added also a unit test reproducing the faulty classads coming form the misconfigured schedd.
Check w/ Krista and CERN to reproduce the problem and get the classad

#3 Updated by Marco Mascheroni 9 months ago

CMS observed a similar issue. It was because of this job:

[mmascher@vocms080 ~]$ condor_q -name login.uscms.org 5634653 -af RequestCPUs
error
[mmascher@vocms080 ~]$ condor_q -name login.uscms.org 5634653 -format
'%s\n' RequestCPUs
out.txt
[mmascher@vocms080 ~]$ condor_q -name login.uscms.org 5634653

-- Schedd: login.uscms.org : <192.170.227.118:9618?...
 ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD
5634653.0   emyrclement     2/4  17:08   0+00:00:00 I  0   0.0
connect_wrapper.sh

#4 Updated by Marco Mascheroni about 2 months ago

  • First Occurred set to v3_6_1

Test that setting an "unpickable" RequestCPUs does not crash the frontend and then close.

#5 Updated by Marco Mascheroni about 2 months ago

  • First Occurred changed from v3_6_1 to Summary
  • Target version set to v3_6_1

#6 Updated by Marco Mascheroni 13 days ago

  • Target version changed from v3_6_1 to v3_6_2


Also available in: Atom PDF