Project

General

Profile

Performance (using google-perftools)

This page (briefly) explains how to use google-perftools in the NOvA offline environment. This tool can be useful in determining where resources (generally CPU cycles) are spent in a job and allow the user to investigate improvements.

Setup

The tool relies on some additional libraries and binaries:

   setup_nova # setup NOvA environment
   # add additions locations to PATH and LD_LIBRARY_PATH
   export PERFTOOLS=/grid/fermiapp/nova/perftools
   source $PERFTOOLS/perftools.sh

Running the executable

In order to acquire profile sampling information during running you must force the profiler library to get loaded and choose a tool (in this case CPU sampling). The CPUPROFILE setting tells it to sample for CPU (vs. heap check or heap profile) usage and record the information in the given file.

   export DATAPATH=/nova/data/art/
   export CPUPROFILE=./mysample.prof
   env LD_PRELOAD=$PERFTOOLS/lib/libprofiler.so \
      nova -n 10 -c mypkgjob.fcl -s $DATAPATH/genie_gen.root

Interpreting the results

The sample file must then be interpreted using:

   pprof --text `which nova` mysample.prof

to get a text output listing, or
   pprof --pdf `which nova` mysample.prof > foo.pdf

if you prefer a PDF. (Other outputs are possible as well; check pprof --help for the whole list.)

(Note: those are back-ticks, to the left of the "1" on most keyboards, around `which ana`)

Adding the "--lines" flag to the pprof command will break it down by line rather than by function.

Increasing the number of nodes in the resulting graph can also sometimes be helpful in tracking down problems in deep function hierarchies (like are common with ART modules); to do so, add the --nodecount argument to the pprof command. (Default is 80 nodes; consider using 200, though requesting more nodes makes the output graph take longer to generate.)

It is sometimes helpful to profile with optimization turned off (using the debug version of novasoft) since some function calls vanish when the compiler optimizes the code. The times may be skewed, but the count of function calls will be accurate.

CAFAna

For CAFAna this whole dance is taken care of for you. Just pass --prof on the cafe command line.