Bug #19067

Trajcluster segfault

Added by Will Foreman about 3 years ago. Updated about 3 years ago.

Target version:
Start date:
Due date:
% Done:


Estimated time:
Occurs In:


I'm noticing that many more jobs than usual are encountering segfaults, so I tested the reco over a file that was known to cause it.

In lariatsoft v06_68_00 (which sources larreco v06_54_02), I'm attempting to reconstruct this data file:

I'm running this fcl for reconstruction:

This performs caldata, gaushit, and trajcluster. When excluding trajcluster, the reco finishes; but when I include trajcluster back into the reco chain, I get a seg fault error at event '9245'. Unfortunately it seems trajcluster was stripped of all the helpful debug statements, so I have no idea when or why this segfault occurs within the algorithm.


#1 Updated by Gianluca Petrillo about 3 years ago

  • Category set to Reconstruction
  • Status changed from New to Assigned
  • Assignee set to Bruce Baller
  • Occurs In v06_68_00 added
  • Experiment LArSoft added

Thank you for the detailed report.
For your information:

  1. you can set up lariatsoft v06_68_00 with e15:debug qualifiers (setup lariatsoft v06_68_00 -q e15:debug), which should have no symbol stripping
  2. you can run lar with the --trace option, and art will print what it's doing; in this way you should be able to see that the last module to operate is TrajCluster

Anyway, I can reproduce the issue. Backtrace:

#0  0x00007ffff69d5023 in std::vector<unsigned int, std::allocator<unsigned int> >::~vector (this=0x35000000cc, __in_chrg=<optimized out>)
    at /scratch/workspace/art-rel-bld/SLF6/debug/build/gcc/v6_4_0/Linux64bit+2.6-2.12/include/c++/6.4.0/bits/stl_vector.h:426
#1  0x00007fffe2efd10a in tca::TrajPoint::~TrajPoint (this=0x350000006c, __in_chrg=<optimized out>) at /scratch/workspace/build-larsoft/v06_68_00/SLF6/debug/build/larreco/v06_54_02/src/larreco/RecoAlg/TCAlg/DataStructs.h:143
#2  0x00007fffe2f140fc in std::_Destroy<tca::TrajPoint> (__pointer=0x350000006c) at /scratch/workspace/build-larsoft/v06_68_00/SLF6/debug/build/gcc/v6_4_0/Linux64bit+2.6-2.12/include/c++/6.4.0/bits/stl_construct.h:93
#3  0x00007fffe2f0f762 in std::_Destroy_aux<false>::__destroy<tca::TrajPoint*> (__first=0x350000006c, __last=0x35b3d8e178 <main_arena+88>)
    at /scratch/workspace/build-larsoft/v06_68_00/SLF6/debug/build/gcc/v6_4_0/Linux64bit+2.6-2.12/include/c++/6.4.0/bits/stl_construct.h:103
#4  0x00007fffe2f0a771 in std::_Destroy<tca::TrajPoint*> (__first=0x350000006c, __last=0x35b3d8e178 <main_arena+88>)
    at /scratch/workspace/build-larsoft/v06_68_00/SLF6/debug/build/gcc/v6_4_0/Linux64bit+2.6-2.12/include/c++/6.4.0/bits/stl_construct.h:126
#5  0x00007fffe2f040a5 in std::_Destroy<tca::TrajPoint*, tca::TrajPoint> (__first=0x350000006c, __last=0x35b3d8e178 <main_arena+88>)
    at /scratch/workspace/build-larsoft/v06_68_00/SLF6/debug/build/gcc/v6_4_0/Linux64bit+2.6-2.12/include/c++/6.4.0/bits/stl_construct.h:151
#6  0x00007fffe2f000f3 in std::vector<tca::TrajPoint, std::allocator<tca::TrajPoint> >::operator= (this=0xba4eab0, __x=...)
    at /scratch/workspace/build-larsoft/v06_68_00/SLF6/debug/build/gcc/v6_4_0/Linux64bit+2.6-2.12/include/c++/6.4.0/bits/vector.tcc:196
#7  0x00007fffe2efd297 in tca::Trajectory::operator= (this=0xba4eab0) at /scratch/workspace/build-larsoft/v06_68_00/SLF6/debug/build/larreco/v06_54_02/src/larreco/RecoAlg/TCAlg/DataStructs.h:166
#8  0x00007fffdb9c00d6 in tca::SplitTraj (tjs=..., itj=5, pos=39, ivx=3, prt=false) at /scratch/workspace/build-larsoft/v06_68_00/SLF6/debug/build/larreco/v06_54_02/src/larreco/RecoAlg/TCAlg/Utils.cxx:2025
#9  0x00007fffdb917bd5 in tca::SplitAtKink (tjs=..., pfp=..., sep=1, prt=false) at /scratch/workspace/build-larsoft/v06_68_00/SLF6/debug/build/larreco/v06_54_02/src/larreco/RecoAlg/TCAlg/PFPUtils.cxx:1068
#10 0x00007fffdb99f173 in tca::Match3DVtxTjs (tjs=..., tpcid=..., prt=false) at /scratch/workspace/build-larsoft/v06_68_00/SLF6/debug/build/larreco/v06_54_02/src/larreco/RecoAlg/TCAlg/TCVertex.cxx:1310
#11 0x00007fffdb919dac in tca::FindPFParticles (fcnLabel=..., tjs=..., tpcid=..., prt=false) at /scratch/workspace/build-larsoft/v06_68_00/SLF6/debug/build/larreco/v06_54_02/src/larreco/RecoAlg/TCAlg/PFPUtils.cxx:1310
#12 0x00007fffe2ec4075 in tca::TrajClusterAlg::RunTrajClusterAlg (this=0x8a31f20, evt=...) at /scratch/workspace/build-larsoft/v06_68_00/SLF6/debug/build/larreco/v06_54_02/src/larreco/RecoAlg/TrajClusterAlg.cxx:430
#13 0x00007fffcfa6e658 in cluster::TrajCluster::produce (this=0x6079970, evt=...) at /scratch/workspace/build-larsoft/v06_68_00/SLF6/debug/build/larreco/v06_54_02/src/larreco/ClusterFinder/

I will not have the time to check this out until at least tomorrow.
Assigning to Bruce Baller, who is the author of that code.
Bruce, if you need the FHiCL configuration file and input file moved somewhere else, let us know.

#2 Updated by Lynn Garren about 3 years ago

  • Status changed from Assigned to Resolved
  • Target version changed from v06_68_00 to v06_69_00

Bruce has pushed a change to larreco develop which resolves this issue.

This includes another change - the production of SpacePoints by TrajCluster has been removed because the quality was inadequate. People should expect data product changes from the CI tests.

#3 Updated by Lynn Garren about 3 years ago

  • Status changed from Resolved to Assigned

uboonecode unit tests and prodsingle_uboone_max2 fail

#4 Updated by Lynn Garren about 3 years ago

  • Status changed from Assigned to Resolved
  • % Done changed from 0 to 100

Problem fixed and tested.

Also available in: Atom PDF