Project

General

Profile

Notes for using Open SpeedShop

There are notes for the discussion in the Neat Topics for Programmers session on March 13, 2014. They are not intended to be complete notes for usage!

The version installed on cluck, and what we looked at, is 2.0.2. The most recent version is 2.1, which is available on cluck through UPS.

Krell is in the process of converting their “store data in local files” version to a “stream data into a database” version (CBTF version). This isn’t done yet; for now, using files seems better. CBTF is still “experimental” in this release.

How to run the tools on a machine where they are already installed.

The tools of interest are installed on cluck at /opt/OSS-2.0.2. We have looked at:
  1. ossusertime - useful for sampling profiling of singly-threaded programs
  2. osshwctime - collects PAPI events related to OPENSS_HWCTIME_EVENT
  3. osshwcsamp - collects PAPI events related to OPENSS_HWCSAMP_EVENTS
  4. openss - looks at collected samples

Some of the tools can measure multithreaded programs with sufficient accuracy.

Other tools we have used can not do this. The ossusertime tool does not do this well, but is useful for singly-threaded programs. The measurements based on hardware counters can deal with multithreaded programs.

Demonstration!

First, we'll use ossusertime to look at a singly-threaded program: ex04.cc. We compile without optimization so that we can see what deep call stacks look like. If we optimize this code, the compiler collapses the call stacks dramatically. When profiling for performance measurements, you should always be using optimized code!

Second, we'll use osshwctime and then osshwcsamp to look at a multithreaded program: work_balance. The code is available in /home/paterno/oss-testing on cluck.

The work of these programs are done in 'busyloop' and 'threadwork1' and 'threadwork2'. 'threadwork1' and 'threadwork2' are identical except for their names. The code is:

double busyloop(double lat1, double lat2, double long1, double long2)
{
 double const R = 3964.0;
 double temp1 = sin(lat1) * sin(lat2) + cos(lat1) * cos(lat2) * sin(long1-long2);
 double temp2 = cos(lat1) * cos(lat2) * sin(long1-long2);
 return(2.0 * R * atan(sqrt((1.0-temp1)/(1.0+temp2))));
}

// threadwork1 and threadwork2 are intentionally identical, and share no
// code. This is so we can see different function names in a profiler,
// which should identify totally different call paths, while being
// certain the functions do exactly the same work.

void threadwork1(int workload, tbb::tick_count& t0, tbb::tick_count& t1)
{
 t0 = tbb::tick_count::now();
 double sum = 0.0;
 long sz = workload * 50 * 1000;
 for (long i = 0; i != sz; ++i)
   {
     double a = i;
     double b = 2*i;
     double c = 2*a+b;
     double d = 7.5*i;
     sum += busyloop(a,b,c,d);
   }
 t1 = tbb::tick_count::now();
}

Installation

Installation requires a bunch of RPMs be present on the system (for SLF). Installation notes says “previous experiences have resulted in this list of candidate packages”:

yum install -y rpm-build \
gcc gcc-c++ \
openmpi \
patch \
autoconf automake \ elfutils-libelf elfutils-libelf-devel \ libxml2 libxml2-devel \
binutils binutils-devel \
python python-devel \
flex bison bison-devel bison-runtime \ libtool libtool-ltdl libtool-ltdl-devel

On my VM, I didn’t have openmpi; installing it dragged in 6 additional RPMs:

Installing:
 openmpi                                     x86_64                         1.5.4-2.el6                                       slf-security                         2.2 M
Installing for dependencies:
 environment-modules                         x86_64                         3.2.7b-6.el6                                      slf                                   95 k
 infinipath-psm                              x86_64                         3.0.1-115.1015_open.2.el6                         slf-security                         159 k
 libesmtp                                    x86_64                         1.0.4-15.el6                                      slf                                   56 k
 libibverbs                                  x86_64                         1.1.7-1.el6                                       slf-security                          44 k
 librdmacm                                   x86_64                         1.0.17-1.el6                                      slf-security                          55 k
 numactl                                     x86_64                         2.0.7-3.el6                                       slf                                   58 k

My VM did not have libelf-elfutils-devel; installing it dragged in no other packages.
My VM did not have libxml2-devel; installing it dragged in no other packages.
My VM did not have binutils-devel; installing it dragged in no other packages.
My VM did not have python-devel; installing it dragged in no other packages.
My VM did not have bison-devel; installing it dragged in no other packages.
My VM did not have bison-runtime; installing it dragged in no other packages.
My VM did not have libtool-ltdl-devel; installing it dragged in no other packages.