- Table of contents
- Performance (using google-perftools)
Performance (using google-perftools)¶
This page (briefly) explains how to use google-perftools in the NOvA offline environment. This tool can be useful in determining where resources (generally CPU cycles) are spent in a job and allow the user to investigate improvements.
The tool relies on some additional libraries and binaries:
setup_nova # setup NOvA environment # add additions locations to PATH and LD_LIBRARY_PATH export PERFTOOLS=/grid/fermiapp/nova/perftools source $PERFTOOLS/perftools.sh
Running the executable¶
In order to acquire profile sampling information during running you must force the profiler library to get loaded and choose a tool (in this case CPU sampling). The CPUPROFILE setting tells it to sample for CPU (vs. heap check or heap profile) usage and record the information in the given file.
export DATAPATH=/nova/data/art/ export CPUPROFILE=./mysample.prof env LD_PRELOAD=$PERFTOOLS/lib/libprofiler.so \ nova -n 10 -c mypkgjob.fcl -s $DATAPATH/genie_gen.root
Interpreting the results¶
The sample file must then be interpreted using:
pprof --text `which nova` mysample.prof
to get a text output listing, or
pprof --pdf `which nova` mysample.prof > foo.pdf
if you prefer a PDF. (Other outputs are possible as well; check
pprof --helpfor the whole list.)
(Note: those are back-ticks, to the left of the "1" on most keyboards, around
Adding the "
--lines" flag to the
pprof command will break it down by line rather than by function.
Increasing the number of nodes in the resulting graph can also sometimes be helpful in tracking down problems in deep function hierarchies (like are common with ART modules); to do so, add the
--nodecount argument to the
pprof command. (Default is 80 nodes; consider using 200, though requesting more nodes makes the output graph take longer to generate.)
It is sometimes helpful to profile with optimization turned off (using the
debug version of novasoft) since some function calls vanish when the compiler optimizes the code. The times may be skewed, but the count of function calls will be accurate.
For CAFAna this whole dance is taken care of for you. Just pass
--prof on the
cafe command line.