Project

General

Profile

Bug #16776

Reco1 memory leak

Added by Lorena Escudero sanchez over 3 years ago. Updated about 3 years ago.

Status:
Closed
Priority:
Normal
Category:
Reconstruction
Target version:
-
Start date:
06/06/2017
Due date:
% Done:

100%

Estimated time:
Spent time:
Occurs In:
Experiment:
MicroBooNE
Co-Assignees:
Duration:

Description

While doing some testing with the latest version of LArSoft, a memory leak seems to be appearing in the reco1 stage.

Tested with v06_38_00 (and also v06_37_00 yesterday). Testing area can be found here: /uboone/app/users/lorena/v06_38_00_clean
this is a clean LArSoft v06_38_00 version, which has been used to create the input file as follows:

lar -c prodgenie_bnb_nu_uboone.fcl -n 2 (output file is: prodgenie_bnb_nu_uboone_20170606T133725_gen.root)
lar -c standard_g4_uboone.fcl -n 2 prodgenie_bnb_nu_uboone_20170606T133725_gen.root
lar -c standard_detsim_uboone.fcl -n 2 prodgenie_bnb_nu_uboone_20170606T133725_gen_20170606T133816_g4.root

The next stage,

lar -c reco_uboone_mcc7_driver_stage1.fcl -n 2 prodgenie_bnb_nu_uboone_20170606T133725_gen_20170606T133816_g4_20170606T134253_detsim.root

shows a problem and jobs are killed when running over ~10-20 events due to the memory they are using (more than 15GB).
To get further information I run it with valgrind for those 2 events, and the output can be found at /uboone/app/users/lorena/v06_38_00_clean/valgrind_reco_uboone_mcc7_driver_stage1_fnal.txt and also at http://www.hep.phy.cam.ac.uk/~escudero/valgrind_reco_uboone_mcc7_driver_stage1_fnal.txt

Valgrind leak summary shows an important leak of memory at the end:

28497 LEAK SUMMARY:
28497 definitely lost: 821,585 bytes in 12,612 blocks
28497 indirectly lost: 787,094,708 bytes in 597,637 blocks
28497 possibly lost: 286,529,744 bytes in 30,981 blocks
28497 still reachable: 196,354,311 bytes in 240,301 blocks
28497 suppressed: 0 bytes in 0 blocks

Together with lots of:

Conditional jump or move depends on uninitialised value(s)

I understand that some of these are ROOT internal related, but seems a big leakage after only 2 events which causes problems when running in local machines and its making the jobs crash after ~10 events.

Associated revisions

Revision 3598a828 (diff)
Added by Gianluca Petrillo over 3 years ago

Fix to memory leak reported in issue #16776.

When using the "K" option to create a TVirtualFFT object, the user is
responsible of destroying it at the end.

Revision 3f0435f6 (diff)
Added by Gianluca Petrillo over 3 years ago

Fix to memory leak reported in issue #16776.

When using the "K" option to create a TVirtualFFT object, the user is
responsible of destroying it at the end.

Revision 3598a828 (diff)
Added by Gianluca Petrillo over 3 years ago

Fix to memory leak reported in issue #16776.

When using the "K" option to create a TVirtualFFT object, the user is
responsible of destroying it at the end.

Revision 3598a828 (diff)
Added by Gianluca Petrillo over 3 years ago

Fix to memory leak reported in issue #16776.

When using the "K" option to create a TVirtualFFT object, the user is
responsible of destroying it at the end.

Revision 3598a828 (diff)
Added by Gianluca Petrillo over 3 years ago

Fix to memory leak reported in issue #16776.

When using the "K" option to create a TVirtualFFT object, the user is
responsible of destroying it at the end.

Revision 3598a828 (diff)
Added by Gianluca Petrillo over 3 years ago

Fix to memory leak reported in issue #16776.

When using the "K" option to create a TVirtualFFT object, the user is
responsible of destroying it at the end.

Revision 3598a828 (diff)
Added by Gianluca Petrillo over 3 years ago

Fix to memory leak reported in issue #16776.

When using the "K" option to create a TVirtualFFT object, the user is
responsible of destroying it at the end.

Revision 3598a828 (diff)
Added by Gianluca Petrillo over 3 years ago

Fix to memory leak reported in issue #16776.

When using the "K" option to create a TVirtualFFT object, the user is
responsible of destroying it at the end.

Revision 3598a828 (diff)
Added by Gianluca Petrillo over 3 years ago

Fix to memory leak reported in issue #16776.

When using the "K" option to create a TVirtualFFT object, the user is
responsible of destroying it at the end.

Revision 3598a828 (diff)
Added by Gianluca Petrillo over 3 years ago

Fix to memory leak reported in issue #16776.

When using the "K" option to create a TVirtualFFT object, the user is
responsible of destroying it at the end.

Revision 3598a828 (diff)
Added by Gianluca Petrillo over 3 years ago

Fix to memory leak reported in issue #16776.

When using the "K" option to create a TVirtualFFT object, the user is
responsible of destroying it at the end.

Revision 3598a828 (diff)
Added by Gianluca Petrillo over 3 years ago

Fix to memory leak reported in issue #16776.

When using the "K" option to create a TVirtualFFT object, the user is
responsible of destroying it at the end.

History

#1 Updated by Lorena Escudero sanchez over 3 years ago

I also noticed a lot of messages like:

Invalid read of size 8

#2 Updated by Marc Paterno over 3 years ago

You may find the instructions at https://cdcvs.fnal.gov/redmine/projects/art/wiki/Getting_started_with_valgrind to be helpful in using Valgrind. In particular, a huge number of spurious complaints coming from ROOT code can be suppressed using the supplied "suppressions" file, by using the following flag to valgrind:

--suppressions=${ROOTSYS}/etc/valgrind-root.supp

#3 Updated by Gianluca Petrillo over 3 years ago

  • Category set to Reconstruction
  • Assignee set to Gianluca Petrillo

The 286 MB leak is interestingly close to the size of the photon library. If that's the case, it's not a relevant leak.
The 787 MB is instead worrying.

#4 Updated by Lorena Escudero sanchez over 3 years ago

Hello,

I did try with the ROOT suppression option; but it doesn't seem to like the pointed suppression file though:

valgrind --suppressions=/grid/fermiapp/products/larsoft/root/v6_08_06d/Linux64bit+2.6-2.12-e14-nu-prof/etc/valgrind-root.supp --leak\
-check=full --leak-resolution=high --num-callers=40 lar -c reco_uboone_mcc7_driver_stage1.fcl -n 2 prodgenie_bnb_nu_uboone_20170606T\
123056_gen_20170606T123207_g4_20170606T123900_detsim.root > valgrind_supp.txt 2>&1

20315 Memcheck, a memory error detector
20315 Copyright (C) 2002-2012, and GNU GPL'd, by Julian Seward et al.
20315 Using Valgrind-3.8.1 and LibVEX; rerun with -h for copyright info
20315 Command: lar -c reco_uboone_mcc7_driver_stage1.fcl -n 2 prodgenie_bnb_nu_uboone_20170606T123056_gen_20170606T123207_g4_201\
70606T123900_detsim.root
20315
location should be "...", or should start with "fun:" or "obj:"
20315 FATAL: in suppressions file "/grid/fermiapp/products/larsoft/root/v6_08_06d/Linux64bit+2.6-2.12-e14-nu-prof/etc/valgrind-r\
oot.supp" near line 25:
20315 location should be "...", or should start with "fun:" or "obj:"
20315 exiting now.

#5 Updated by Gianluca Petrillo over 3 years ago

  • Tracker changed from Necessary Maintenance to Bug
  • Status changed from New to Assigned
  • Occurs In v06_37_00, v06_38_00 added

#6 Updated by Gianluca Petrillo over 3 years ago

  • Status changed from Assigned to Resolved
  • % Done changed from 0 to 100

Thank you for the precise report.
Although I took extra steps to confirm it (running massif tool of valgrind), memcheck log you posted contained the culprit already.

It turns out that some calls to TVirtualFFT::FFT() used the option "K" (not sure about the reason). Without that option, ROOT creates a new FFT object (if needed), stores is as global and manages it. With that option, instead, the new FFT is handed to the user, who is responsible of its management, and global FFT is not touched.
So, in those cases, the FFT object needed to be deleted after use. This is in fact documented in ROOT.

This was in uboonecode:source:uboone/CalData/NoiseFilterAlgs/RawDigitFFTAlg.cxx, called by MicroBooNE module uboonecode:source:uboone/CalData/RawDigitFilterUBooNE_module.cc; the fix is now in commit uboonecode:3598a8288fa0be300869c722a32136a1e9a2b26b.

#7 Updated by Gianluca Petrillo over 3 years ago

Note that the same issue appears in uboonecode:source:uboone/DetSim/SimWireMicroBooNE_module.cc, where it is not deleted (bad), but it's not recreated at every raw digit, which prevents a serious memory leak.

#8 Updated by Gianluca Petrillo about 3 years ago

  • Status changed from Resolved to Closed


Also available in: Atom PDF