Project

General

Profile

Bug #20642

RoutingMaster crash on shutdown (terminate)

Added by Ron Rechenmacher about 2 years ago. Updated over 1 year ago.

Status:
Closed
Priority:
Normal
Category:
-
Target version:
Start date:
08/20/2018
Due date:
% Done:

0%

Estimated time:
Experiment:
-
Co-Assignees:
Duration:

Description

With the latest pdune SW, I'm seeing back-traces like:

Core was generated by `RoutingMasterMain -c id: 5275 commanderPluginType: xmlrpc application_name: Rou'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007fd291b3b4e2 in _int_malloc () from /lib64/libc.so.6
[Current thread is 1 (Thread 0x7fd2974d9f80 (LWP 18791))]
(gdb) bt
#0  0x00007fd291b3b4e2 in _int_malloc () from /lib64/libc.so.6
#1  0x00007fd291b3e10c in malloc () from /lib64/libc.so.6
#2  0x00007fd29731f111 in _dl_signal_error () from /lib64/ld-linux-x86-64.so.2
#3  0x00007fd29731f2ae in _dl_signal_cerror () from /lib64/ld-linux-x86-64.so.2
#4  0x00007fd29731a4bd in _dl_lookup_symbol_x () from /lib64/ld-linux-x86-64.so.2
#5  0x00007fd291bf1ced in call_dl_lookup () from /lib64/libc.so.6
#6  0x00007fd29731f314 in _dl_catch_error () from /lib64/ld-linux-x86-64.so.2
#7  0x00007fd291bf1ff0 in do_sym () from /lib64/libc.so.6
#8  0x00007fd2908ad0d4 in dlsym_doit () from /lib64/libdl.so.2
#9  0x00007fd29731f314 in _dl_catch_error () from /lib64/ld-linux-x86-64.so.2
#10 0x00007fd2908ad5bd in _dlerror_run () from /lib64/libdl.so.2
#11 0x00007fd2908ad128 in dlsym () from /lib64/libdl.so.2
#12 0x00007fd2947a36e5 in TROOT::InitInterpreter() () from /mu2e/ups/root/v6_12_04e/Linux64bit+3.10-2.17-e15-prof/lib/libCore.so
#13 0x00007fd2947a3a3d in ROOT::Internal::GetROOT2() () from /mu2e/ups/root/v6_12_04e/Linux64bit+3.10-2.17-e15-prof/lib/libCore.so
#14 0x00007fd29480766f in TEnv::Getvalue(char const*) const () from /mu2e/ups/root/v6_12_04e/Linux64bit+3.10-2.17-e15-prof/lib/libCore.so
#15 0x00007fd294807a03 in TEnv::GetValue(char const*, char const*) const () from /mu2e/ups/root/v6_12_04e/Linux64bit+3.10-2.17-e15-prof/lib/libCore.so
#16 0x00007fd294808900 in DefaultErrorHandler(int, bool, char const*, char const*) () from /mu2e/ups/root/v6_12_04e/Linux64bit+3.10-2.17-e15-prof/lib/libCore.so
#17 0x00007fd29480847c in ErrorHandler () from /mu2e/ups/root/v6_12_04e/Linux64bit+3.10-2.17-e15-prof/lib/libCore.so
#18 0x00007fd294808654 in Break(char const*, char const*, ...) () from /mu2e/ups/root/v6_12_04e/Linux64bit+3.10-2.17-e15-prof/lib/libCore.so
#19 0x00007fd2948aa065 in TUnixSystem::DispatchSignals(ESignals) () from /mu2e/ups/root/v6_12_04e/Linux64bit+3.10-2.17-e15-prof/lib/libCore.so
#20 <signal handler called>
#21 0x00007fd291af69cd in __run_exit_handlers () from /lib64/libc.so.6
#22 0x00007fd291af6ab5 in exit () from /lib64/libc.so.6
#23 0x00007fd00dd40539 in xmlrpc_c::(anonymous namespace)::sigterm (signalClass=0xf) at server_abyss.cpp:41
#24 <signal handler called>
#25 0x00007fd291b7d1ad in nanosleep () from /lib64/libc.so.6
#26 0x00007fd291badec4 in usleep () from /lib64/libc.so.6
#27 0x00007fd00ccc09c0 in xmlrpc_millisecond_sleep (milliseconds=<optimized out>) at sleep.c:21
#28 0x00007fd00d2ff808 in waitForConnectionFreed (outstandingConnListP=0x2388b30) at server.c:1034
#29 waitForNoConnections (outstandingConnListP=<optimized out>) at server.c:1046
#30 serverRun2 (errorP=0x7ffd33c82338, serverP=0x2388c10) at server.c:1253
#31 ServerRun (serverP=serverP@entry=0x2388c10) at server.c:1280
#32 0x00007fd00dd403f8 in xmlrpc_c::setupSignalsAndRunAbyss (abyssServerP=0x2388c10) at server_abyss.cpp:760
#33 0x00007fd00dd41219 in xmlrpc_c::serverAbyss_impl::run (this=<optimized out>) at server_abyss.cpp:771
#34 0x00007fd00dd416bd in xmlrpc_c::serverAbyss::run (this=<optimized out>) at server_abyss.cpp:873
#35 0x00007fd00df6c20b in artdaq::xmlrpc_commander::run_server (this=0x23848f0) at /home/ron/work/artdaqPrj/demo25-kurt2/srcs/artdaq/artdaq/ExternalComms/xmlrpc_commander.cc:1133
#36 0x000000000041518d in main (argc=<optimized out>, argv=<optimized out>) at /home/ron/work/artdaqPrj/demo25-kurt2/srcs/artdaq_mpich_plugin/artdaq-mpich-plugin/Application/RoutingMasterMain.cc:65
(gdb) info threads 
  Id   Target Id         Frame 
* 1    Thread 0x7fd2974d9f80 (LWP 18791) 0x00007fd291b3b4e2 in _int_malloc () from /lib64/libc.so.6
  2    Thread 0x7fcdfe496700 (LWP 22466) 0x00007fd291bc3eec in __lll_lock_wait_private () from /lib64/libc.so.6
(gdb) thread 2
[Switching to thread 2 (Thread 0x7fcdfe496700 (LWP 22466))]
#0  0x00007fd291bc3eec in __lll_lock_wait_private () from /lib64/libc.so.6
(gdb) bt
#0  0x00007fd291bc3eec in __lll_lock_wait_private () from /lib64/libc.so.6
#1  0x00007fd291b40a6f in _L_lock_5333 () from /lib64/libc.so.6
#2  0x00007fd291b3a408 in _int_free () from /lib64/libc.so.6
#3  0x00007fd2929b7319 in __gnu_cxx::new_allocator<std::__detail::_Hash_node<std::pair<fhicl::ParameterSetID const, fhicl::ParameterSet>, true> >::deallocate (this=0x7fd292c4f8d0 <fhicl::ParameterSetRegistry::instance_()::s_registry+16>, 
    __p=<optimized out>) at /scratch/workspace/build-gallery/SLF7/prof/build/gcc/v6_4_0/Linux64bit+3.10-2.17/include/c++/6.4.0/ext/new_allocator.h:110
#4  std::allocator_traits<std::allocator<std::__detail::_Hash_node<std::pair<fhicl::ParameterSetID const, fhicl::ParameterSet>, true> > >::deallocate (__n=0x1, __p=<optimized out>, __a=...)
    at /scratch/workspace/build-gallery/SLF7/prof/build/gcc/v6_4_0/Linux64bit+3.10-2.17/include/c++/6.4.0/bits/alloc_traits.h:462
#5  std::__detail::_Hashtable_alloc<std::allocator<std::__detail::_Hash_node<std::pair<fhicl::ParameterSetID const, fhicl::ParameterSet>, true> > >::_M_deallocate_node (
    this=0x7fd292c4f8d0 <fhicl::ParameterSetRegistry::instance_()::s_registry+16>, __n=0x2331c00) at /scratch/workspace/build-gallery/SLF7/prof/build/gcc/v6_4_0/Linux64bit+3.10-2.17/include/c++/6.4.0/bits/hashtable_policy.h:1973
#6  std::__detail::_Hashtable_alloc<std::allocator<std::__detail::_Hash_node<std::pair<fhicl::ParameterSetID const, fhicl::ParameterSet>, true> > >::_M_deallocate_nodes (
    this=0x7fd292c4f8d0 <fhicl::ParameterSetRegistry::instance_()::s_registry+16>, __n=<optimized out>) at /scratch/workspace/build-gallery/SLF7/prof/build/gcc/v6_4_0/Linux64bit+3.10-2.17/include/c++/6.4.0/bits/hashtable_policy.h:1984
#7  std::_Hashtable<fhicl::ParameterSetID, std::pair<fhicl::ParameterSetID const, fhicl::ParameterSet>, std::allocator<std::pair<fhicl::ParameterSetID const, fhicl::ParameterSet> >, std::__detail::_Select1st, std::equal_to<fhicl::ParameterSetID>, fhicl::detail::HashParameterSetID, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, false, true> >::clear (
    this=0x7fd292c4f8d0 <fhicl::ParameterSetRegistry::instance_()::s_registry+16>) at /scratch/workspace/build-gallery/SLF7/prof/build/gcc/v6_4_0/Linux64bit+3.10-2.17/include/c++/6.4.0/bits/hashtable.h:1901
#8  std::_Hashtable<fhicl::ParameterSetID, std::pair<fhicl::ParameterSetID const, fhicl::ParameterSet>, std::allocator<std::pair<fhicl::ParameterSetID const, fhicl::ParameterSet> >, std::__detail::_Select1st, std::equal_to<fhicl::ParameterSetID>, fhicl::detail::HashParameterSetID, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, false, true> >::~_Hashtable (
    this=0x7fd292c4f8d0 <fhicl::ParameterSetRegistry::instance_()::s_registry+16>, __in_chrg=<optimized out>) at /scratch/workspace/build-gallery/SLF7/prof/build/gcc/v6_4_0/Linux64bit+3.10-2.17/include/c++/6.4.0/bits/hashtable.h:1227
#9  std::unordered_map<fhicl::ParameterSetID, fhicl::ParameterSet, fhicl::detail::HashParameterSetID, std::equal_to<fhicl::ParameterSetID>, std::allocator<std::pair<fhicl::ParameterSetID const, fhicl::ParameterSet> > >::~unordered_map (
    this=0x7fd292c4f8d0 <fhicl::ParameterSetRegistry::instance_()::s_registry+16>, __in_chrg=<optimized out>) at /scratch/workspace/build-gallery/SLF7/prof/build/gcc/v6_4_0/Linux64bit+3.10-2.17/include/c++/6.4.0/bits/unordered_map.h:98
#10 fhicl::ParameterSetRegistry::~ParameterSetRegistry (this=0x7fd292c4f8c0 <fhicl::ParameterSetRegistry::instance_()::s_registry>, __in_chrg=<optimized out>)
    at /scratch/workspace/build-gallery/SLF7/prof/build/fhiclcpp/v4_06_05/src/fhiclcpp/ParameterSetRegistry.cc:50
#11 0x00007fd291af6a69 in __run_exit_handlers () from /lib64/libc.so.6
#12 0x00007fd291af6ab5 in exit () from /lib64/libc.so.6
#13 0x00007fd00dd40539 in xmlrpc_c::(anonymous namespace)::sigterm (signalClass=0xf) at server_abyss.cpp:41
#14 <signal handler called>
#15 0x00007fd291b7d1ad in nanosleep () from /lib64/libc.so.6
#16 0x00007fd291badec4 in usleep () from /lib64/libc.so.6
#17 0x00007fd294f6938f in artdaq::StatisticsCollection::run (this=0x7fd295223600 <artdaq::StatisticsCollection::getInstance()::singletonInstance>)
    at /home/ron/work/artdaqPrj/demo25-kurt2/srcs/artdaq_core/artdaq-core/Core/StatisticsCollection.cc:75
#18 0x00007fd29641391d in boost::(anonymous namespace)::thread_proxy (param=<optimized out>) at libs/thread/src/pthread/thread.cpp:171
#19 0x00007fd293ca7e25 in start_thread () from /lib64/libpthread.so.0
#20 0x00007fd291bb634d in clone () from /lib64/libc.so.6
(gdb) 

History

#1 Updated by Eric Flumerfelt about 2 years ago

This may have been due to the StatisticsCollection thread not being properly shutdown at terminate. See artdaq/feature/routing_shutdown_stats.

#2 Updated by Eric Flumerfelt over 1 year ago

  • Assignee set to Ron Rechenmacher
  • Status changed from New to Resolved
  • Co-Assignees Eric Flumerfelt added

#3 Updated by Eric Flumerfelt over 1 year ago

  • Status changed from Resolved to Reviewed

#4 Updated by Eric Flumerfelt over 1 year ago

  • Target version set to artdaq v3_04_00
  • Status changed from Reviewed to Closed


Also available in: Atom PDF