G4hpc Planning 1202012

We had our first Geant4 FWP planning meeting on 1/19/2012. These are the notes from the whiteboard. When the new redmine site/repo is available, this page should be moved to that location.

This page also contains the note from the follow-up status meeting on 2/9/2012.

general things that need to be addressed

Development environment

  • Applications? - simplified calorimeter, CMSEXP
  • Geant4 base version? - Geant4-9.5 installed as UPS product
  • Compiler? - 4.6.x
  • Performance tools? - CodeAnalyst, CallGrind, FAST, (perhaps Openspeedshop and TAU)
  • Repository? - yes, in redmine, started with snapshot of the above Geant4 code
  • Tools for parallel work? - the standard set of OpenMP, TBB, CUDA, OpenCL, (perhaps Charm++ and a few others of its type)
  • UPS-lite use? - yes, hierarchy of products needs to be spelled out and implemented

Analysis task

  • analysis and restructuring document
  • break out of G4 components that we will be studying
  • thing to be measured
    • memory use for geometry
    • types of queries made to geometry in a realistic setting
    • pattern of access or use of the geometry
  • program traces for validation (particle trajectories, function call parameters, etc.)
  • configuration of Geant4 applications (to study geometry)
  • what should the interactions with geometry look like? (Philippe says that some work on this is underway at CERN - APIs, functionality, etc.)


For Jan 31

  • development and runtime environment
  • applications configured, installed, easy to run
  • verify everything is working (run applications and check output)

Status for the Jan 31 tasks

  • g4hpc setup as repository, g4hpcbenchmarks established also, all buildable from Cluck. G4 is available there also as UPS (f1 qualifiers, v4_9_5). Each of the data packages is a separate UPS product. The data products are maintained separately.
  • install of g4hpc and the benchmarks: products can be installed according to cetbuildtools (checked this on cluck)
  • g4benchmarks need as a dependency either the Geant4 install or the g4hpc stuff, one or the other is selected during the run of cmake command (use either G4HPC_FQ_DIR or GEANT4_FQ_DIR in the cmake command - see instructions for g4hpcbenchmarks).
  • things are verified

applications configured (cluck) -

  • valgrind 4.7 (integrated with debugger),
  • CodeAnalyst (no UPS setup now, maybe use the "fakeups product stuff from Chris for this)
  • OpenSpeedshop (no UPS setup now, might not be so great a thing)
  • No TAU installed yet
  • No kcachegrind - issue with Qt library versions (>3.3.8) (unresolved issue here)

g4hpcbenchmarks does not have the setup_for_development complete. Would be useful to add a ups directory to this package.
(Might be useful to add this g4hpcbenchmarks to the use-cases for the cetbuildtools review.)

Philippe working with CodeAnalyst. Found that you must be careful with scaling of results. Philippe will start logging issues like this on the wiki.
Goal is to estimate or predict how much "vectorizing" Geant4 can help. Vectorizing means running the same set of instructions (functions) across many elements at the same time. Not explicit SSE-like optimizations, but at a larger scale.
Wants to study poor cache behavior and then relate that to vectorizing code.
Wants to locate any code that is slow, and then pull it out as set of vectorized functions.
What is the relationship of this study with the work plan overall?

We agreed that Philippe will use the non-physics application for his study instead of the full program, and that this sub-project is to be reported on in detail in two weeks to see this if his process is worth pursuing further. He will have a much better understanding of the performance tools at the end of this period and hopefully will be able to report on useful measurements. The idea of locating chunks of code to be vectorized is not directly in line with out FWP goals. Philippe also mentioned responsibility of reporting to the CERN group of work such as the vectoring code sections. Jim mentioned the other needs of the FWP - to advance our knowledge and use of GPUs and be able to join with future HPC efforts and utilize the US big hardware, and also to report progress on the FWP we agreed upon with Lali.

For Feb 29

  • rerun Philippe's performance benchmark runs
  • resource use of geometry and propagation components
  • generate G4 sample "input file" for test runs, consisting of primaries and secondaries from a run of CMSEXP, keeping track of path length.
  • run this "input file" through the applications with no physics processes on, and write out the trajectories. This gives up a large set of starting points and answers to begin testing alternative organizations of geometry / magnetic field calculations.
  • start design of standalone application to propagate particles
  • design for magnetic field handling

status as of 2/9 for the February tasks

Soon is able to run the cmsexp without physics. Step lengths have a large affect on performance. Will still need to keep step lengths. To run requires modified code to skip the physics (dummy physics), not runtime configurable. Currently need an install of g4hpc without physics and with physics. Maybe go over the input and output of what Soon has completed for the generate and run steps above.

2/9/2012 - what next?

At the 2/16 meeting Soon will cover the cmsexp with/without physics work, showing everyone the inputs, code that changed, and information tracked. Philippe will go over the performance tools he's been using and show us examples of runs and what can be measured.