Lisa Goodenough (ANL), Jim Kowalkowski (FNAL), Tom LeCompte (ANL), Adam Lyon (FNAL), Jesse Melhuish (U Kentucky), Marc Paterno (FNAL)
The overall goal of this project is to enable a portion of the g-2 simulation with art to run on Mira at ALCF and on future HPC platforms.
Within the context of muon g-2, there will be several simulation processing scenarios that can be usefully implemented on HPC machines. Running any of them on HPC resources can help muon g-2 complete important computing tasks from now until the time of detector commissioning.
In the above figure, we depict the major steps in the muon g-2 simulation workflow. Each involves a large collection of art jobs. It also shows several interesting choices for running pieces of the processing chain on HPC resources. Current practice is for files to be produced between each of the stages. Stage one is expected to produce 10^12 muons over the next 1.5 years and about 10^9 muons by the end of 2015. There could be hundreds of TB of stage (2) fill events on disk over the next 1.5 years. Options A, B or C are good candidates for providing early useful results from HPC runs. Both can be used to show streaming between different processing stages with the possible removeal of file handling. Both can possibly contribute directly to simualtion sample generation for muon g-2.
The initial tasks will be focused on moving forward on stage (1). This will only involve exercising Geant4 within the context of art. Work on other stages will require stage (1) work to be completed. It turns out that muons can be generated in 10K groups. This simplified the workflow and may mean that stages (1) and (2) can be combined really into one.
Here is a view of the phase one system, where the workflow consists of one processing task (the G4 MT module) and one I/O task for each participating worker node. It assumes that artdaq is used for the I/O aggregation.
The above figure depicts art running Geant4 MT simulation as a simple art module, driven by artdaq. Artdaq provides art source and output modules for using the network to transfer complex objects. The diagram shows the initial plan: construct a workflow that writes to one output file from many "builder" processes. The second task will be do attempt configuration that permits more than one aggregator. The configuration for this case may need the assistance of the DAQ group. Here the multithreading is completely managed within the art G4 MT simulation module.
Tasks - Porting¶
There are two categories of tasks associated with this project, porting and code development. We discuss the details of each of these in turn.
The art software is based on C++11 / C++14 standards. The software system included tools for building, packaging, and deploying all internal and external components that are necessary to run art in the context of an experiment, such as muon g-2. The primary platform (Linux operating system) is SLF6 (Redhat Enterprise Linux). External dependencies include the large packages ROOT and Geant4. With this in mind, we are moving along the following path to make it to the end goal of running on Mira.
- Build and run on Edison at NERSC
- Add support to run on Mira
- Migrate to Cori phase 1 at NERSC
In order to accomplish these smaller goals, we will need to execute the following tasks. These tasks will be expanded over the next couple of weeks.
Running art on Edison¶
The older SuSE platform will need to be added to the underlying packaging tools used by art so the whole software suite can be cleanly built on Edison. Building art includes art itself, along with a number of external product that are build as part of the installation procedure. It also requires a number of platform packages to be available (packages not controlled by the _art packaging system). It is likely that Edison will not have all these platform packages available. They will need to be added to the system or build manually in user logon space.
There is a second option to explore here. CMS, along with other projects, has started exploring Docker containers for making HEP software available on Edison. We can pick up some of that work and make a Docker container capable of running art. This would be portable to Cori.
Adding Mira as a Platform¶
The same procedure for adding SuSE Linux should also work with adding Mira as a new platform.
The art suite has a cmake-based build package called cetbuildtools. These tools are oriented towards building code with gcc. The BG-Clang compiler looks like it is our best option for moving onto Mira, partly because of its better C++ support. Support for BG-Clang will need to be added to the cmake build system.
Moving to Cori¶
Moving to Cori should be a straightforward task. It is basically an upgrade of the Edison environment.
The next step for this platform is to add support for the Intel compiler. This should be very similar to adding BG-Clang.
Tasks - Development¶
Three major tasks have been identified in this area. They are listed below in priority order. Generating muons is a meaningful step in g-2, so there is value in getting just the just stage running beyond just showing incremental progress.
- Geant4 MT 10.x upgrade of g-2 software and MT loop integration
- artdaq use and configuration for g-2 offline processing (with aggregator task)
- multi-schedule art - moving this forward to a functioning version
There are two parts to this part of the project: moving g-2 code to Geant4 10.1 patch-02 and integrating the new G4 MT event loop into the g-2 simulation module. The first task is straightforward porting. The second involves enabling the MT mode and understanding how to permit multiple particle to be propagated at the same time. Geant4 is run within one art module. Within g-2, each time this module is invoked, one muon is propagated through the detector and the track / hit information is added to the current event. In the new MT scheme, we want an invocation of the G4 module to produce a configurable number of muons and propagate them through the detector, storing all particle information into the current event. The ideal number of particles is 10K so that it matches the bunches needed by further downstream processing. We believe that the advantage of the MT system will be that n threads can be operating simultaneously to propagate that 10K set of particles. So if there are 64 threads available, each thread should be propagating 10K/64=157 particles. The trick here is learning how G4 collects the track and hit information across the thread to be stored into the event in one place.
Artdaq contains infrastructure that surrounds art for use in real-time processing setting, such as DAQ. Event data can be sent to data aggregation point called aggregators for one-spot writing. The movement of event data from one node to another is accomplished using MPI. This is very similar to the services that are needed on an HPC machine for aggregating data for writing. This task involves setting up the artdaq demo system.
Artdaq provides a demo package (https://cdcvs.fnal.gov/redmine/projects/artdaq-demo/wiki) for getting it set up and tested.
- run artdaq on the local cluster using the g-2 software stack and the standard g-2 simulation. Evaluate this architecture for simulation work and evaluate its scaling.
- port artdaq to Edison and demonstrate the same thing. This might be aided by the porting and docker activity that is going on the porting tasks.
Afterwards we will need to determine what parts of artdaq for useful for this project overall and then consider porting it do Mira, along with art.
So far none of the work requires art to be internally scheduling many things at once. We are relying on G4 MT and a very simple workflow to gain the use of multiple threads. To move beyond this, we need to permit multiple events to be active at the same time across each thread. A project called "multi-schedule art" was started a while back. This project needs to be revived. Much of the design work was completed and demonstrated. The actual thread scheduling is managed by TBB (Intel Threading Build Blocks). The event coordination and workflow is managed by the art Scheduler. This develop is important for processing multiple events within one process image on one node.
A rough timeline for various stages of the project is shown below. As works progresses this timeline may be modified.
Timeline for Porting¶
- Build and run on Edison at NERSC: to be completed by October 11, 2015
- Add support to run on Mira: to be completed by November 22nd, 2015
- Migrate to Cori phase 1 at NERSC: to be completed by December 20th, 2015
Timeline for Code Development¶
- Geant4 MT 10.x upgrade of g-2 software and MT loop integration: to be completed by
- artdaq use and configuration for g-2 offline processing (with aggregator task): to be completed by
- multi-schedule art - moving this forward to a functioning version: to be completed by