Project

General

Profile

Bug #21135

ROOT::Minuit2::VariableMetricBuilder::Minimum Failure

Added by Hunter Sullivan about 1 year ago. Updated 11 months ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
Reconstruction
Target version:
-
Start date:
10/13/2018
Due date:
% Done:

100%

Estimated time:
Spent time:
Occurs In:
Experiment:
LArIAT
Co-Assignees:
Duration:

Description

lariatsoft/larsoft version: v07_07_01
quals: e17:debug and c2:debug

The LArIAT debug CI tests are failing.

Here is the failure and the gdb backtrace:

lar: /scratch/workspace/canvas-products/vdevelop/e17/SLF6/debug/build/root/v6_12_06a/source/root-6.12.06/math/minuit2/src/VariableMetricBuilder.cxx:267: ROOT::Minuit2::FunctionMinimum ROOT::Minuit2::VariableMetricBuilder::Minimum(const ROOT::Minuit2::MnFcn&, const ROOT::Minuit2::GradientCalculator&, const ROOT::Minuit2::MinimumSeed&, std::vector<ROOT::Minuit2::MinimumState>&, unsigned int, double) const: Assertion `s0.IsValid()' failed.

Program received signal SIGABRT, Aborted.
0x000000367aa324f5 in raise () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install cyrus-sasl-lib-2.1.23-13.el6.x86_64 expat-2.0.1-13.el6_8.x86_64 freetype-2.3.11-15.el6_6.1.x86_64 glibc-2.12-1.212.el6.x86_64 keyutils-libs-1.4-4.el6.x86_64 krb5-libs-1.10.3-42z1.el6_7.x86_64 libICE-1.0.6-1.el6.x86_64 libSM-1.1.0-7.1.el6.x86_64 libX11-1.6.4-3.el6.x86_64 libXau-1.0.6-4.el6.x86_64 libXdamage-1.1.3-4.el6.x86_64 libXext-1.3.2-2.1.el6.x86_64 libXfixes-5.0.3-1.el6.x86_64 libXmu-1.1.1-2.el6.x86_64 libXt-1.1.4-6.1.el6.x86_64 libXxf86vm-1.1.3-2.1.el6.x86_64 libcom_err-1.41.12-12.el6.x86_64 libcurl-7.19.7-53.el6_9.x86_64 libdrm-2.4.65-2.el6.x86_64 libidn-1.18-2.el6.x86_64 libselinux-2.0.94-5.3.el6.x86_64 libssh2-1.4.2-2.el6_7.1.x86_64 libuuid-2.17.2-12.9.el6.x86_64 libxcb-1.12-4.el6.x86_64 mesa-dri-drivers-11.0.7-4.el6.x86_64 mesa-libGL-11.0.7-4.el6.x86_64 mesa-libGLU-11.0.7-4.el6.x86_64 ncurses-libs-5.7-3.20090208.el6.x86_64 nspr-4.19.0-1.el6.x86_64 nss-3.36.0-8.el6.x86_64 nss-softokn-freebl-3.14.3-23.el6_7.x86_64 nss-util-3.36.0-1.el6.x86_64 openldap-2.4.40-6.el6_7.x86_64 openssl-1.0.1e-48.sl6_8.4.x86_64 pcre-7.8-4.el6.x86_64 xz-libs-4.999.9-0.3.beta.20091007git.el6.x86_64 zlib-1.2.3-29.el6.x86_64
(gdb) backtrace 15
#0  0x000000367aa324f5 in raise () from /lib64/libc.so.6
#1  0x000000367aa33cd5 in abort () from /lib64/libc.so.6
#2  0x000000367aa2b66e in __assert_fail_base () from /lib64/libc.so.6
#3  0x000000367aa2b730 in __assert_fail () from /lib64/libc.so.6
#4  0x00007fffd2dc84b3 in ROOT::Minuit2::VariableMetricBuilder::Minimum(ROOT::Minuit2::MnFcn const&, ROOT::Minuit2::GradientCalculator const&, ROOT::Minuit2::MinimumSeed const&, std::vector<ROOT::Minuit2::MinimumState, std::allocator<ROOT::Minuit2::MinimumState> >&, unsigned int, double) const () at /scratch/workspace/canvas-products/vdevelop/e17/SLF6/debug/build/root/v6_12_06a/source/root-6.12.06/math/minuit2/src/VariableMetricBuilder.cxx:267
#5  0x00007fffd2dc6557 in ROOT::Minuit2::VariableMetricBuilder::Minimum(ROOT::Minuit2::MnFcn const&, ROOT::Minuit2::GradientCalculator const&, ROOT::Minuit2::MinimumSeed const&, ROOT::Minuit2::MnStrategy const&, unsigned int, double) const () at /scratch/workspace/canvas-products/vdevelop/e17/SLF6/debug/build/root/v6_12_06a/source/root-6.12.06/math/minuit2/src/VariableMetricBuilder.cxx:124
#6  0x00007fffd2dbeea1 in ROOT::Minuit2::ModularFunctionMinimizer::Minimize(ROOT::Minuit2::MnFcn const&, ROOT::Minuit2::GradientCalculator const&, ROOT::Minuit2::MinimumSeed const&, ROOT::Minuit2::MnStrategy const&, unsigned int, double) const () at /scratch/workspace/canvas-products/vdevelop/e17/SLF6/debug/build/root/v6_12_06a/source/root-6.12.06/math/minuit2/src/ModularFunctionMinimizer.cxx:166
#7  0x00007fffd2dbea9e in ROOT::Minuit2::ModularFunctionMinimizer::Minimize(ROOT::Minuit2::FCNBase const&, ROOT::Minuit2::MnUserParameterState const&, ROOT::Minuit2::MnStrategy const&, unsigned int, double) const () at /scratch/workspace/canvas-products/vdevelop/e17/SLF6/debug/build/root/v6_12_06a/source/root-6.12.06/math/minuit2/src/ModularFunctionMinimizer.cxx:120
#8  0x00007fffd2d8ee39 in ROOT::Minuit2::Minuit2Minimizer::Minimize() () at /scratch/workspace/canvas-products/vdevelop/e17/SLF6/debug/build/root/v6_12_06a/source/root-6.12.06/math/minuit2/src/Minuit2Minimizer.cxx:510
#9  0x00007fffe023e414 in trkf::TrackMomentumCalculator::GetMomentumMultiScatterChi2(art::Ptr<recob::Track> const&) () at /scratch/workspace/build-larsoft/v07_07_01/SLF6/debug/build/larreco/v07_04_04/src/larreco/RecoAlg/TrackMomentumCalculator.cxx:827
#10 0x00007fffe15d59fb in lariat::AnaTreeT1034::analyze(art::Event const&) () at /scratch/workspace/lariat-release-build/BUILDTYPE/debug/label1/swarm/label2/SLF6/temp/srcs/lariatsoft/LArIATAnaModule/AnaTreeT1034_module.cc:1422
#11 0x00007ffff645f5a2 in art::EDAnalyzer::doEvent () at /scratch/workspace/art-release-build/SLF6/debug/build/art/v2_11_03/src/art/Framework/Core/EDAnalyzer.cc:29
#12 0x00007fffe1647f3b in art::WorkerT<art::EDAnalyzer>::implDoProcess () at /cvmfs/larsoft.opensciencegrid.org/products/art/v2_11_03/include/art/Framework/Core/WorkerT.h:88
#13 0x00007ffff72bb575 in bool art::Worker::ImplDoWork<(art::BranchActionType)2>::invoke<art::EventPrincipal>(art::Worker*, art::EventPrincipal&, art::CurrentProcessingContext const*) () at /scratch/workspace/art-release-build/SLF6/debug/build/art/v2_11_03/src/art/Framework/Principal/Worker.h:201
#14 0x00007ffff72b0d63 in bool art::Worker::doWork<art::ProcessPackage<(art::Level)4> >(art::ProcessPackage<(art::Level)4>::MyPrincipal&, art::CurrentProcessingContext const*) () at /scratch/workspace/art-release-build/SLF6/debug/build/art/v2_11_03/src/art/Framework/Principal/Worker.h:259

The failure occurs on event 5 when running our reconstruction:

lar -c Reco.fcl -s /lariat/data/users/hsulliva/lariat_r6100_sr0004.root
valgrind.dump (2.71 MB) valgrind.dump Hunter Sullivan, 10/15/2018 12:10 PM

History

#1 Updated by Kyle Knoepfel about 1 year ago

  • Status changed from New to Feedback

Hunter, can you please run valgrind on this process--this will help determine whether the bug is in ROOT, lariat, larsoft, or art.

setup valgrind v3_13_0
valgrind --leak-check=no --track-origins=yes --log-file=valgrind.dump --suppressions=$ROOTSYS/etc/valgrind-root.supp lar -c <...>

Please post the valgrind.dump log file for us to look at.

#2 Updated by Hunter Sullivan about 1 year ago

I have attached the valgrind.dump log file after running:

valgrind --leak-check=no --track-origins=yes --log-file=valgrind.dump --suppressions=$ROOTSYS/etc/valgrind-root.supp lar -c Reco.fcl -s /lariat/data/users/hsulliva/lariat_r6100_sr0004.root --nskip 4

#3 Updated by Kyle Knoepfel 12 months ago

  • Status changed from Feedback to Assigned
  • Assignee set to Kyle Knoepfel

Thank you for the valgrind dump. The TrackMomentumCalculator class is fraught with design problems. I will do what I can to clean them up in a short timescale.

#4 Updated by Kyle Knoepfel 12 months ago

  • Status changed from Assigned to Resolved
  • % Done changed from 0 to 100

This issue has been resolved with commits on the feature/knoepfel_TrackMomentumCalculator_cleanup branch in LArReco:

I have verified that the above commits result in a successful run of the job reported in this issue. A corresponding branch has been created in lariatsoft:feature/knoepfel_TrackMomentumCalculator_cleanup that can be merged into the lariatsoft:develop branch once LArSoft has agreed to adopt the changes, some of which break the current interface.

In brief, the TrackMomentumCalculator class has been drastically adjusted to support better data-access patterns and encapsulation of implementation details. In particular, the following changes have been implemented:

  • Removal of the 'using namespace std;' directive that appeared in the header file (something that is never encouraged)
  • Removal of global variables that were declared (and defined!) in the header file (also something that is never encouraged)
  • Changing the public access of some data members and many member functions to private access
  • Removal of an unnecessary 2 MB of data members that were used only to translate to ROOT's interface
  • Removal of getchar() calls which cannot be used sensibly in a framework context

Although the TrackMomentumCalculator class can still be substantially polished, it is not clear whether that is warranted in light of other code that needs to be addressed throughout LArSoft.

#5 Updated by Kyle Knoepfel 12 months ago

  • Status changed from Resolved to Assigned
  • % Done changed from 100 to 50

I was able to demonstrate a successful run with the prof build, but I have not been able to demonstrate a successful run with debug builds.

#6 Updated by Kyle Knoepfel 12 months ago

  • Status changed from Assigned to Resolved
  • % Done changed from 50 to 100

An additional commit was necessary to fix the problem at hand: invalid numbers (NaNs, etc.) were being presented to the minimization routine. Whenever NaNs appear, the algorithm no longer takes them into account, thus avoiding the problem. If different behavior is desired, then an additional discussion is necessary.

Implemented with commit larreco:21345fa on the same branches as before--although those branches have been rebased on top of the current larreco:develop and lariatsoft:develop branches.

#7 Updated by Kyle Knoepfel 12 months ago

  • Status changed from Resolved to Work in progress

#8 Updated by Kyle Knoepfel 11 months ago

  • Status changed from Work in progress to Closed


Also available in: Atom PDF