DiscussionWithMakoto22Aug2011 » History » Version 12

« Previous - Version 12/20 (diff) - Next » - Current version
Daniel Elvira, 08/22/2011 02:18 PM


FNAL interests

Driven by the needs of the HEP scientific program for better physics and increased time/memory performance, we would like to focus on a Geant4 redesign that takes full advantage of new computing technology.

  1. Profiling tools and protocols applied to relevant Geant4 simulation applications.
  2. Framework driving parallel processing including event-level parallelism.
  3. Geant4 high level architecture re-design and implementation.

FNAL Questions

Our understanding is that Geant4 has plans for reengineering, for performance improvement and multi-core and many-core efficiency gains and use.

  1. How pervasive is the effort? How much change is allowed in internals, and in user interface?
  2. What's the organization for the re-engineering project, who's the leader, what institutions are involved, how is the work subdivided?
  3. What are SLAC's and CERN's roles at the moment?
  4. What is the conclusion of CERN's R&D studies (Rene's efforts)?

FNAL Ideas

The high-level architecture of a parallel-capable framework

The HEP experiments that use Geant4 each has its own event processing framework, and so a modified Geant4 must be amenable for use in an experiment's framework.
FNAL supplies the art framework to several Intensity Frontier experiments. We are working on a version of this framework that can process multiple events in parallel; we call this event-level parallelism.
We believe that such parallelism is important for several reasons. Particular to simulation efforts, event-level parallelism may allow us to take advantage of heterogeneous computing resources on modern and future machines.
The diagram below shows how a framework that is capable of event-level parallelism might communicate with Geant4:

How the high-level architecture of Geant4 might be modified

  • Each track will need to know to what event it belongs.
  • Each track will need to know which module instance contains the context for digitizing; Geant4 callbacks must be associated with the right module instance.
  • The reader of the output queue of tracks will need to assemble tracks back into events and know that all tracks are complete for an event; the event then needs to given back to the right module instance.
  • [remove this?] The events will need to be directed towards downstream processing modules.

Performance measurements

Sequential performance of Geant4 code remains important. The measurement of the performance of realistic experiment codes is not easy, because substantial "hotspots" have usually been detected and remove before we become involved in the project.
We are interested in learning what capabilities the PERI tools provide, since we do not have deep expertise in their use.
We use our own tool (the FAST profiler) to sample full call-stack information for single-threaded applications, and to perform statistical analysis on the collected data. We have found that full call stack information is important in understanding large applications that involve complicated libraries, because the performance characteristics of a library routine often depends on the context from which it is called. We have found statistical analysis to be necessary because performance analysis is mostly a data exploration exercise.

Code transformation tools

The PERI project includes ROSE, a "source-to-source" translation tool. Here are some ways in which such tools might be of use in our involvement with Geant4:

1. ROSE would be useful for analysis of code, along the lines of what we discussed with Semantic Designs
2. ROSE might be useful in helping to generate tests; the PERI binary instrumentation stuff might be useful in this as well. Testing is a critically weak spot.
3. It might be worth investigating the use of ROSE to generate code for things like translating between array-of-struct and struct-of-array formats.

The most important use would be in automated analysis of the code, to augment the analysis of the performance results. For example, we might be able to use ROSE to enhance our "metadata" about functions the profiler observes. Right now we have only
function names and addresses. It would be great to be able to identify functions as belonging to specific types of classes, as taking certain kinds of arguments, of being instantiated from a certain template, etc. Parsing names to do this is very hard, but
this should be trivial for ROSE.