Project

General

Profile

Selectively instrumenting Geant4 with TAU

The following steps show how to configure TAU and use it to collect performance counters for Geant4 by using selective instrumentation. Note that the steps are mostly the same for full instrumentation, the main difference is that you don't have to use a select.tau file (then TAU will instrument everything). Note that the toolchain is confirmed to work with gcc 4.7.3 and may not work yet with gcc 4.8.2.

Configuring and building TAU and its prerequisites

  • Download PDT: pdt_lite.tgz . Configure and build it, for example:
    tar xfz pdt.tgz
    cd pdtoolkit-3.20
    ./configure --prefix=$HOME/soft/pdt-3.20
    make install
    
  • If you wish to collect hardware counters, make sure you have installed a recent version of PAPI.
  • Download TAU: tau.tgz . Configure and build it, for example:
    tar xfz tau.tgz
    cd tau-2.23
    ./configure -c++=g++ -cc=gcc -pdt=$HOME/soft/pdtoolkit-3.20 -PROFILECALLPATH \
        -bfd=download -MULTIPLECOUNTERS -papi=/disks/soft/papi-5.3.0 -prefix=$HOME/soft/tau-2.23
    make install
    

To see a complete list of options, use "./configure -help". You can create many configurations for TAU and install them at the same prefix.

Another configuration that only collects wallclock time measurements can be created with:

./configure -c++=g++ -cc=gcc -pdt=/disks/soft/pdtoolkit-3.20 -PROFILECALLPATH \
    -bfd=download -LINUXTIMERS -prefix=$HOME/soft/tau-2.23
make install

You can see the last configuration in the .last_config file, and all configurations in the .all-configs file. Complete TAU documentation is available in the TAU User Guide.

Using TAU

Environment settings

Before building Geant4, make sure the following environment variables are set; use the correct paths for your installation.

export TAU_DIR $HOME/soft/tau-2.23
export PATH $TAU_DIR/x86_64/bin:$PATH
export TAU_MAKEFILE $TAU_DIR/x86_64/lib/Makefile.tau-callpath-papi-pdt
export TAU_CALLPATH 1
export TAU_CALLPATH_DEPTH 100
export TAU_OPTIONS='-optVerbose -optRevert -optTau=-inline -optTauSelectFile="$HOME/Geant4/select.tau"'

The TAU_DIR and PATH variables are self-explanatory. Here is a short description of each of the others.

  • TAU_MAKEFILE: Points to a specific TAU makefile in the TAU installation directory's architecture-specific library subdirectory. This variable is used to switch between configurations, for example, the above setting uses TAU configured with PAPI hardware counter support. To use builtin linux timers for measuring only wallclock time, you can specify Makefile.tau-callpath-pdt (which corresponds to the second TAU configuration example in the TAU installation instructions above).
  • TAU_CALLPATH: if you configured TAU with the -PROFILECALLPATH option, then you can use this variable to enable or disable the collection of callpath data. If disabled, you will get flat profiles (i.e., a single value per function). If enabled, you will get callpaths of the calling context (this incurs greater runtime overhead).
  • TAU_CALLPATH_DEPTH: limit the depth of the callpath information; a very large setting such as 100 results in collecting complete callpaths; shorter callpaths may be sufficient, a typical setting is between 2 and 5.
  • TAU_OPTIONS: This variable contains command-line options that are passed to TAU's instrumentation phase. The -optTauSelectFile option is used to specify the TAU select file, which is used to control the instrumentation; in the option above, you should change the path to point to the actual location of your select.tau file. For example, to only instrument the top 20 routines identified by Fast in profiling SimplifiedCalo in geant4.9.6.r10, the contents of the select.tau should be:
BEGIN_INCLUDE_LIST

"#G4KleinNishinaCompton::SampleSecondaries#" 
"#G4Physics2DVector::Value#" 
"#G4PhysicsVector::Value#" 
"#G4SauterGavrilaAngularDistribution::SampleDirection#" 
"#G4SeltzerBergerModel::SampleSecondaries#" 
"#G4UniversalFluctuation::SampleFluctuations#" 
"#G4UrbanMscModel::ComputeGeomPathLength#" 
"#G4UrbanMscModel::ComputeTruePathLengthLimit#" 
"#G4UrbanMscModel::SampleCosineTheta#" 
"#G4UrbanMscModel::SampleDisplacement#" 
"#G4UrbanMscModel::SampleScattering#" 
"#G4VDiscreteProcess::PostStepGetPhysicalInteractionLength#" 
"#G4VEmProcess::GetCurrentLambda#" 
"#G4VEmProcess::PostStepDoIt#" 
"#G4VEmProcess::PostStepGetPhysicalInteractionLength#" 
"#G4VEnergyLossProcess::AlongStepDoIt#" 
"#G4VEnergyLossProcess::PostStepGetPhysicalInteractionLength#" 
"#G4VMultipleScattering::AlongStepDoIt#" 
"#G4VelocityTable::Value#" 

END_INCLUDE_LIST

The syntax is explained in the TAU user manual, but basically # is a wildcard character, so only functions that match the above pattern will be instrumented.

Configuring and building Geant4

  • Go to your Geant4 build directory and configure Geant4 as usual, with the following additional options to CMAKE:
    cmake -DCMAKE_CXX_COMPILER=tau_cxx.sh -DCMAKE_CC_COMPILER=tau_cc.sh ..
    

The beginning of the output of cmake should look something like this:

-- The C compiler identification is GNU
-- The CXX compiler identification is GNU
-- Check for working C compiler: /disks/large/soft/gcc-4.7.3/bin/gcc
-- Check for working C compiler: /disks/large/soft/gcc-4.7.3/bin/gcc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working CXX compiler: /disks/large/soft/tau-2.23/x86_64/bin/tau_cxx.sh
-- Check for working CXX compiler: /disks/large/soft/tau-2.23/x86_64/bin/tau_cxx.sh -- works
...

  • Build Geant4 as usual, but make the build verbose in order to check for instrumentation problems. The output from
    make VERBOSE=1
    should look similar to this:
    /usr/bin/cmake -E cmake_progress_report /homes/norris/research/apps/Geant4/geant4.10.0-emreview-taus-build/CMakeFiles 
    [ 24%] Building CXX object source/particles/CMakeFiles/G4particles.dir/management/src/G4IsomerTable.cc.o
    cd /homes/norris/research/apps/Geant4/geant4.10.0-emreview-taus-build/source/particles && /disks/large/soft/tau-2.23/x86_64/bin/tau_cxx.sh   -DG4particles_EXPORTS -DG4_STORE_TRAJECTORY -DG4VERBOSE -DG4PARTICLES_ALLOC_EXPORT -DGEANT4_DEVELOPER_RELWITHDEBINFO -W -Wall -pedantic -Wno-non-virtual-dtor -Wno-long-long -Wwrite-strings -Wpointer-arith -Woverloaded-virtual -Wno-variadic-macros -Wshadow -pipe -std=c++98 -O2 -g -fPIC -I/homes/norris/research/apps/Geant4/geant4.10.0-emreview/source/externals/clhep/include -I/homes/norris/research/apps/Geant4/geant4.10.0-emreview/source/global/HEPGeometry/include -I/homes/norris/research/apps/Geant4/geant4.10.0-emreview/source/global/HEPRandom/include -I/homes/norris/research/apps/Geant4/geant4.10.0-emreview/source/global/management/include -I/homes/norris/research/apps/Geant4/geant4.10.0-emreview/source/materials/include -I/homes/norris/research/apps/Geant4/geant4.10.0-emreview/source/particles/management/include -I/homes/norris/research/apps/Geant4/geant4.10.0-emreview/source/particles/adjoint/include -I/homes/norris/research/apps/Geant4/geant4.10.0-emreview/source/particles/bosons/include -I/homes/norris/research/apps/Geant4/geant4.10.0-emreview/source/particles/hadrons/barions/include -I/homes/norris/research/apps/Geant4/geant4.10.0-emreview/source/particles/hadrons/ions/include -I/homes/norris/research/apps/Geant4/geant4.10.0-emreview/source/particles/hadrons/mesons/include -I/homes/norris/research/apps/Geant4/geant4.10.0-emreview/source/particles/leptons/include -I/homes/norris/research/apps/Geant4/geant4.10.0-emreview/source/geometry/management/include -I/homes/norris/research/apps/Geant4/geant4.10.0-emreview/source/intercoms/include -I/homes/norris/research/apps/Geant4/geant4.10.0-emreview/source/particles/shortlived/include -I/homes/norris/research/apps/Geant4/geant4.10.0-emreview/source/particles/utils/include   -o CMakeFiles/G4particles.dir/management/src/G4IsomerTable.cc.o -c /homes/norris/research/apps/Geant4/geant4.10.0-emreview/source/particles/management/src/G4IsomerTable.cc
    
    Debug: Parsing with PDT Parser
    Executing> /disks/soft/pdtoolkit-3.18.1/x86_64/bin/cxxparse /homes/norris/research/apps/Geant4/geant4.10.0-emreview/source/particles/management/src/G4IsomerTable.cc -I/disks/soft/tau-2.23/include -DPROFILING_ON -DTAU_PAPI -I/disks/soft/papi-4.4.0/src -I/disks/soft/papi-4.4.0/include -DTAU_GNU -DTAU_DOT_H_LESS_HEADERS -DTAU_LINUX_TIMERS -DTAU_CALLPATH -DTAU_LARGEFILE -D_LARGEFILE64_SOURCE -DTAU_BFD -DHAVE_GNU_DEMANGLE -DHAVE_TR1_HASH_MAP -DTAU_SS_ALLOC_SUPPORT -DEBS_CLOCK_RES=1 -DTAU_STRSIGNAL_OK -DTAU_TRACK_LD_LOADER -DG4particles_EXPORTS -DG4_STORE_TRAJECTORY -DG4VERBOSE -DG4PARTICLES_ALLOC_EXPORT -DGEANT4_DEVELOPER_RELWITHDEBINFO -I/homes/norris/research/apps/Geant4/geant4.10.0-emreview/source/externals/clhep/include -I/homes/norris/research/apps/Geant4/geant4.10.0-emreview/source/global/HEPGeometry/include -I/homes/norris/research/apps/Geant4/geant4.10.0-emreview/source/global/HEPRandom/include -I/homes/norris/research/apps/Geant4/geant4.10.0-emreview/source/global/management/include -I/homes/norris/research/apps/Geant4/geant4.10.0-emreview/source/materials/include -I/homes/norris/research/apps/Geant4/geant4.10.0-emreview/source/particles/management/include -I/homes/norris/research/apps/Geant4/geant4.10.0-emreview/source/particles/adjoint/include -I/homes/norris/research/apps/Geant4/geant4.10.0-emreview/source/particles/bosons/include -I/homes/norris/research/apps/Geant4/geant4.10.0-emreview/source/particles/hadrons/barions/include -I/homes/norris/research/apps/Geant4/geant4.10.0-emreview/source/particles/hadrons/ions/include -I/homes/norris/research/apps/Geant4/geant4.10.0-emreview/source/particles/hadrons/mesons/include -I/homes/norris/research/apps/Geant4/geant4.10.0-emreview/source/particles/leptons/include -I/homes/norris/research/apps/Geant4/geant4.10.0-emreview/source/geometry/management/include -I/homes/norris/research/apps/Geant4/geant4.10.0-emreview/source/intercoms/include -I/homes/norris/research/apps/Geant4/geant4.10.0-emreview/source/particles/shortlived/include -I/homes/norris/research/apps/Geant4/geant4.10.0-emreview/source/particles/utils/include -I/disks/soft/tau-2.23/include -I/homes/norris/research/apps/Geant4/geant4.10.0-emreview/source/particles/management/src
    
    Debug: Instrumenting with TAU
    Executing> /disks/soft/tau-2.23/x86_64/bin/tau_instrumentor G4IsomerTable.pdb /homes/norris/research/apps/Geant4/geant4.10.0-emreview/source/particles/management/src/G4IsomerTable.cc -o G4IsomerTable.inst.cc -f /homes/norris/research/apps/Geant4/select.tau
    
    Debug: Compiling with Instrumented Code
    Executing> /disks/soft/gcc-4.7.3/bin/g++ -I/homes/norris/research/apps/Geant4/geant4.10.0-emreview/source/particles/management/src -W -Wall -pedantic -Wno-non-virtual-dtor -Wno-long-long -Wwrite-strings -Wpointer-arith -Woverloaded-virtual -Wno-variadic-macros -Wshadow -pipe -std=c++98 -O2 -g -fPIC -c G4IsomerTable.inst.cc -DG4particles_EXPORTS -DG4_STORE_TRAJECTORY -DG4VERBOSE -DG4PARTICLES_ALLOC_EXPORT -DGEANT4_DEVELOPER_RELWITHDEBINFO -I/homes/norris/research/apps/Geant4/geant4.10.0-emreview/source/externals/clhep/include -I/homes/norris/research/apps/Geant4/geant4.10.0-emreview/source/global/HEPGeometry/include -I/homes/norris/research/apps/Geant4/geant4.10.0-emreview/source/global/HEPRandom/include -I/homes/norris/research/apps/Geant4/geant4.10.0-emreview/source/global/management/include -I/homes/norris/research/apps/Geant4/geant4.10.0-emreview/source/materials/include -I/homes/norris/research/apps/Geant4/geant4.10.0-emreview/source/particles/management/include -I/homes/norris/research/apps/Geant4/geant4.10.0-emreview/source/particles/adjoint/include -I/homes/norris/research/apps/Geant4/geant4.10.0-emreview/source/particles/bosons/include -I/homes/norris/research/apps/Geant4/geant4.10.0-emreview/source/particles/hadrons/barions/include -I/homes/norris/research/apps/Geant4/geant4.10.0-emreview/source/particles/hadrons/ions/include -I/homes/norris/research/apps/Geant4/geant4.10.0-emreview/source/particles/hadrons/mesons/include -I/homes/norris/research/apps/Geant4/geant4.10.0-emreview/source/particles/leptons/include -I/homes/norris/research/apps/Geant4/geant4.10.0-emreview/source/geometry/management/include -I/homes/norris/research/apps/Geant4/geant4.10.0-emreview/source/intercoms/include -I/homes/norris/research/apps/Geant4/geant4.10.0-emreview/source/particles/shortlived/include -I/homes/norris/research/apps/Geant4/geant4.10.0-emreview/source/particles/utils/include -DPROFILING_ON -DTAU_PAPI -I/disks/soft/papi-4.4.0/src -I/disks/soft/papi-4.4.0/include -DTAU_GNU -DTAU_DOT_H_LESS_HEADERS -DTAU_LINUX_TIMERS -DTAU_CALLPATH -DTAU_LARGEFILE -D_LARGEFILE64_SOURCE -DTAU_BFD -DHAVE_GNU_DEMANGLE -DHAVE_TR1_HASH_MAP -DTAU_SS_ALLOC_SUPPORT -DEBS_CLOCK_RES=1 -DTAU_STRSIGNAL_OK -DTAU_TRACK_LD_LOADER -I/disks/soft/tau-2.23/include -o CMakeFiles/G4particles.dir/management/src/G4IsomerTable.cc.o
    Looking for file: G4IsomerTable.o
    
    Debug: cleaning inst file
    Executing> /bin/rm -f G4IsomerTable.inst.cc
    
    Debug: cleaning PDB file
    Executing> /bin/rm -f G4IsomerTable.pdb
    
    

You may see warnings, these are safe to ignore. If you see errors, especially if TAU failed to instrument a source file and switched to regular compilation, it can affect the accuracy of the collected performance data and must not be ignored. In some cases, small modifications to the source code can fix such issues. If it's not possible to determine why TAU fails, email .

Collecting performance measurements

  • In SimplifiedCalo, run cmake as usual, with the additional options specifying the TAU compiler wrappers:
cmake -DCMAKE_CXX_COMPILER=tau_cxx.sh -DCMAKE_CC_COMPILER=tau_cc.sh -DGeant4_DIR=/path/to/geant/built/with/tau
  • Specify performance counters

This is currently system-dependent. You can use PAPI counters whose names are the same on all systems, however, not all are available on each system. You can get a list of available counters with the papi_avail command (native counters, i.e., those not accessible through a common PAPI name can also be used -- you can get a list with papi_native_avail). On a Xeon 5400 series, for example, the following configuration can collect several counters at once:

export TAU_METRICS=PAPI_L1_DCM:PAPI_L1_ICM:PAPI_TOT_INS:PAPI_TOT_CYC
./SimplifiedCalo run_SimplifiedCalo.g4

Because not all hardware counters can be measured at once, multiple experiments such as the above may be necessary to collect all desired values. Before each experiment, reset the TAU_METRICS variable to a new list of counters and rerun the executable. The results will be stored in subdirectories whose names start with "MULTI__", for example:

MULTI__PAPI_FP_OPS/
MULTI__PAPI_L1_DCM/
MULTI__PAPI_L1_ICM/
MULTI__PAPI_L1_STM/
MULTI__PAPI_L2_DCA/
MULTI__PAPI_L2_DCR/
MULTI__PAPI_L2_DCW/
MULTI__PAPI_L2_ICM/
MULTI__PAPI_L2_TCM/
MULTI__PAPI_LD_INS/
MULTI__PAPI_TOT_CYC/
MULTI__PAPI_TOT_INS/
MULTI__P_WALL_CLOCK_TIME/

Loading the data into the database

TAU provides a utility for loading the above local data into a (possibly remote) SQL database (different database engines are supported). For example, if you want to upload the data in the Geant4 database hosted at UO and you don't have a TAUdb profile on your machine yet, you need to first set up your database access. To do that, contact Boyana Norris () to obtain the database username and password, and also specify the IP address (just the first 2 numbers) from which you will be accessing it. Then you can follow these steps to configure access to the database (see an example TAUdb configuration session).

Once your database profile is set up you don't have to do it again on the same machine. To load a trial, use taudb_loadtrial, for example:

> taudb_loadtrial -n Standard_50GeV_50events -a SimplifiedCalo -e 0 -c geant4
TrialName: Standard_50GeV_50events
Inserting metrics... done. (0.124 seconds)
Inserting timers... done. (2.087 seconds)
Inserting threads... done. (0.147 seconds)
Inserting timer groups and parameters... done. (0.765 seconds)
Inserting call graph... done. (973.331 seconds)
Inserting per-thread call data... done. (4.467 seconds)
Inserting derived threads... done. (0.116 seconds)
Inserting derived threads call data... done. (28.102 seconds)
Querying new call data for IDs... done. (7.588 seconds)
Inserting timer measurements... done. (5.76 seconds)
Inserting counters... done. (0.067 seconds)
Inserting metadata... done. (0.114 seconds)
Total time to load : 1022.668 seconds
Done saving trial!

Viewing the data.

Extensive documentation is available on the TAU website and in the user manual on how to work with ParaProf and PerfExplorer.