Project

General

Profile

Bug #22459

Help resolve TrajCluster/Calorimetry issue

Added by Tingjun Yang 4 months ago. Updated 4 months ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
-
Target version:
-
Start date:
04/28/2019
Due date:
% Done:

100%

Estimated time:
Spent time:
Occurs In:
Experiment:
DUNE
Co-Assignees:
Duration:

Description

Dear experts,

We are having trouble running trajcluster after moving to larsoft/dunetpc v08_17_00. Here is the command:

lar -c standard_reco_dune10kt_1x2x6.fcl xroot://fndca1.fnal.gov:1094/pnfs/fnal.gov/usr/dune/persistent/users/vito/ci_tests_inputfiles/DUNEFD/detsim/prodgenie_nue_dune10kt_1x2x6_detsim_Reference.root

Here are the gdb backtrace:
#1  0x00007fffddb80891 in calo::CalorimetryAlg::LifetimeCorrection (this=0xaca1270, time=786.5823974609375, T0=T0@entry=0)
    at /data/tjyang/dune/larsoft_dev/srcs/larreco/larreco/Calorimetry/CalorimetryAlg.cxx:136
#2  0x00007fffddb80cb2 in calo::CalorimetryAlg::dEdx_from_dQdx_e (this=0xaca1270, dQdx_e=inf, time=<optimized out>, T0=T0@entry=0)
    at /data/tjyang/dune/larsoft_dev/srcs/larreco/larreco/Calorimetry/CalorimetryAlg.cxx:120
#3  0x00007fffddb80dc5 in calo::CalorimetryAlg::dEdx_AREA (this=<optimized out>, dQdx=<optimized out>, time=<optimized out>, plane=<optimized out>, 
    T0=T0@entry=0) at /data/tjyang/dune/larsoft_dev/srcs/larreco/larreco/Calorimetry/CalorimetryAlg.cxx:113
#4  0x00007fffdeadeac4 in tca::FilldEdx (slc=..., pfp=...) at /data/tjyang/dune/larsoft_dev/srcs/larreco/larreco/RecoAlg/TCAlg/PFPUtils.cxx:1757
#5  0x00007fffdeaed573 in tca::StorePFP (slc=..., pfp=...) at /data/tjyang/dune/larsoft_dev/srcs/larreco/larreco/RecoAlg/TCAlg/PFPUtils.cxx:2835
#6  0x00007fffdeaf48b1 in tca::FindPFParticles (slc=...) at /data/tjyang/dune/larsoft_dev/srcs/larreco/larreco/RecoAlg/TCAlg/PFPUtils.cxx:2250
#7  0x00007fffdfebe683 in tca::TrajClusterAlg::RunTrajClusterAlg (this=0xac9c2c0, hitsInSlice=..., sliceID=<optimized out>)
    at /data/tjyang/dune/larsoft_dev/srcs/larreco/larreco/RecoAlg/TrajClusterAlg.cxx:361
#8  0x00007fffce34b652 in cluster::TrajCluster::produce (this=0xaca0990, evt=...)
    at /data/tjyang/dune/larsoft_dev/srcs/larreco/larreco/ClusterFinder/TrajCluster_module.cc:492

Neither tracluster nor calorimetry reconstruction changed as far as I know.

Thanks,
Tingjun

History

#1 Updated by Kyle Knoepfel 4 months ago

  • Assignee set to Paul Russo
  • Status changed from New to Assigned

#2 Updated by Paul Russo 4 months ago

  • % Done changed from 0 to 100
  • Status changed from Assigned to Resolved

So this is being caused by a bug in art that is fixed in art release 3.02.05, see issue #22407.

There is a workaround the avoid the problem, adding --prune-config to the lar command line:

lar --prune-config -c standard_reco_dune10kt_1x2x6.fcl xroot://fndca1.fnal.gov:1094/pnfs/fnal.gov/usr/dune/persistent/users/vito/ci_tests_inputfiles/DUNEFD/detsim/prodgenie_nue_dune10kt_1x2x6_detsim_Reference.root

prevents the error.

The short explanation for why this art bug provokes a problem is that TrajClusterAlg makes extensive use of global variables to do its work, which means there can be only one instance in the running system.

Unfortunately the file:

dunetpc/fcl/dunefd/reco/standard_reco_dune10kt.fcl

has two instances of TrajClusterAlg configured, once as module label "trajcluster" and once as module label "trajclusterdc":

$ egrep -n 'trajcluster.*:' dunetpc/fcl/dunefd/reco/standard_reco_dune10kt.fcl
62:  trajclusterdc:          @local::dunefdmc_trajcluster
84:  trajcluster:          @local::dunefdmc_trajcluster

So art (with the bug) constructs two TrajClusterAlg instances, which causes memory overwrites in the global variables and you get crashes. The fixed art (3.02.05) will only construct modules that are actually in the trigger/end paths, which prevents the problem.

It would be wise to have the TrajClusterAlg constructor check the tcc.caloAlg global variable to make sure it is the nullptr before proceeding. If it is not it should throw an exception saying that it has detected an attempt to instantiate more than one instance of TrajClusterAlg. (This is not the only possible way to do this, it's just an easy one.)

The problem will be fixed once larsoft upgrades to art 3.02.05.

#3 Updated by Tingjun Yang 4 months ago

Thank you so much Paul. I suspected this was caused by the recent changes to art. Since we do not run trajclusterdc I removed its configuration from the fcl file, which seems to resolve the issue. It's great to know it will be fixed in art soon. Thanks!

#4 Updated by Paul Russo 4 months ago

  • Status changed from Resolved to Closed


Also available in: Atom PDF