Project

General

Profile

Debugging and Checking for Memory Leaks with Valgrind

Setup

Setting up the valgrind as the following after you log onto the VM:

setup valgrind

How to use:

usage: valgrind [options] prog-and-args
For example:
valgrind --leak-check=no --track-origins=yes --num-callers=50 --log-file=valgrind.out \
             nova -c job/patrecjob.fcl mcCosmicFD.root

Simply type the following cmd to check out all options:

valgrind -h

A simple memory leak check

Memory management is very important for DDT processes. As these will be run online for long periods of time, even a small leak can eventually cause a process to grow out of control and fail. Valgrind can monitor the memory usage of a your art job and tell you where you're losing memory. An example of finding and solving a problem is given below. The version of Hough Tracker in commit 536:

https://cdcvs.fnal.gov/redmine/projects/novaddt/repository/revisions/536/entry/trunk/PatRec/HoughTracker_module.cc

is leaking memory. Running a test of this with a command of the form:

valgrind --suppressions=$ROOTSYS/etc/valgrind-root.supp --leak-check=full ddt-filter -c job/FD-FileCapture-hough-2.fcl DDTData-FD-hitproducer-10-130806.root

reveals the leak:

==13242== 3,666,528 (3,614,016 direct, 52,512 indirect) bytes in 112,938 blocks are definitely lost in los
s record 57,939 of 57,939
==13242==    at 0x4A07F61: malloc (vg_replace_malloc.c:236)
==13242==    by 0x54023AB: operator new(unsigned long) (in /nusoft/app/externals/boost/v1_50_0/Linux64bit+
2.6-2.5-e2-debug/lib/libboost_program_options.so.1.50.0)
==13242==    by 0x3CB0854B: novaddt::HoughTracker::filter(art::Event&) (HoughTracker_module.cc:190)

Note that the suppressions option suppress output related to root modules with known features that valgrind will pick up on. This memory leak is revealed to be due to the object created on line 190. This is fixed in commit 570:

https://cdcvs.fnal.gov/redmine/projects/novaddt/repository/diff/trunk/PatRec/HoughTracker_module.cc?rev=570&rev_to=536

to remove events where hough points created as new inside the many nested loops are not cleaned up. A valgrind of the new tag will now not show any bytes being lost by this module. A good rule of thumb is that if you run a full memory check then search the resultant log file and the name of your module never shows up then you're doing a good job.

Zukai's memory leak check

I ran the following cmd:

valgrind --leak-check=full --show-reachable=yes --track-origins=yes --num-callers=50 --log-file=hitporducer.log \
             nova -c job/mltest.fcl -n 1 /nova/ana/trigger/mc/cosmic_5ms/ddt/cosmic-1.root

In the mltest.fcl, the only chain we ran though is TDCSorter, even the pass writer has been commented out.

After talking to Chris G, he suggested running the Valgrind in the following way:

valgrind --leak-check=no --track-origins=yes --num-callers=50 --log-file=hitporducer.log \
             nova -c job/mltest.fcl -n 1 /nova/ana/trigger/mc/cosmic_5ms/ddt/cosmic-1.root

And the improper use of memory of TDC sorter is not found:
==7540== Memcheck, a memory error detector
==7540== Copyright (C) 2002-2010, and GNU GPL'd, by Julian Seward et al.
==7540== Using Valgrind-3.6.1 and LibVEX; rerun with -h for copyright info
==7540== Command: nova -c job/mltest.fcl -n 1 /nova/ana/trigger/mc/cosmic_5ms/ddt/cosmic-1.root
==7540== Parent PID: 6031
==7540== 
==7540== 
==7540== HEAP SUMMARY:
==7540==     in use at exit: 10,052,400 bytes in 93,755 blocks
==7540==   total heap usage: 569,130 allocs, 475,375 frees, 49,414,367 bytes allocated
==7540== 
==7540== For a detailed leak analysis, rerun with: --leak-check=full
==7540== 
==7540== For counts of detected and suppressed errors, rerun with: -v
==7540== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 7 from 7)