Project

General

Profile

Multithreaded framework basics

As of version 3.00.00, the art framework supports concurrent processing of events. The eventual goal is to be able to concurrently process events from different Runs and SubRuns, as illustrated here (time goes to the right):

The current implementation serializes process at SubRun boundaries so that only events within a SubRun are process in parallel:

In the above images, the number of processing loops is 3--these are the number of schedules, which are user-configurable. After the last event of a SubRun has been processed, the number of processing loops is reduced to 1, and then once the first event of a new SubRun is ready for processing, the number of schedules increases to the user-specified number.

 Overall design

The overall multi-threaded design is based off of CMSSW's choices and experiences:

  • The thread-scheduling technology is Intel’s Threading Building Blocks (TBB)
  • The multi-threaded steps to be performed are factorized into tasks (a user's event-level module member function can be thought of as a task)
  • Users are allowed to specify the number of concurrent event loops (i.e. schedules) and the maximum number of threads that the process can use
  • Users do not explicitly create threads themselves
  • Users are allowed to call TBB-provided parallel algorithms in their own code (see Parallelism in user code for guidance)

 Basic guarantees

art guarantees the following behavior:

  • Processing of an event happens on one and only one schedule (see Schedules and transitions).
  • For a given trigger path, modules are processed in the order specified.
  • A module shared among paths will be processed only once per event.
  • Product insertion into the event is thread-safe.
  • Product retrieval from the event is thread-safe.
  • Provenance retrieval from the event is thread-safe.
  • All modules and services provided by art are thread-safe.
    • For TFileService, the user is required to specify additional serialization (see here).

 Opting in to multi-threaded processing

Multi-threaded event processing is not automatically enabled. In order to benefit from it, users must enable the scheduler to run with more than 1 schedule and/or more than 1 thread. In addition, the libraries and modules they use should be implemented in a way to support multi-threaded execution. We discuss below how to configure the scheduler. Structuring your code to support multi-threading is discussed on the Module threading types page.

 Scheduler configuration

art's TBB scheduler is initialized based on parameters that the user is allowed to specify in the 'services.scheduler' table of his/her configuration1:

  • num_threads: Maximum number of threads TBB is allowed to use when executing its tasks--the default value is 1 thread. For HTCondor batch jobs, if the specified number of threads exceeds the requested number of CPUs, then num_threads will be set to the requested number of CPUs. Please see here for details.
  • num_schedules: Number of events to concurrently process at the same time--the default value is 1.
  • stack_size: The stack size (in bytes) that the TBB scheduler will use for its threads--the default is 10 MB, which closely approximates the stack size of the main thread2.

Explicitly specified values for num_threads and num_schedules will be overwritten if any of the below program options are used.

1 For users who would like to invoke TBB algorithms within their own code or use ROOT's implicit multi-threading facilities, please see the guidance here.

2 The default stack size TBB specifies is 1 MB, which can be inadequate for various workflows.
 

 Program options

art provides program options to set the num_schedules and num_threads configurations from the command line:

  • --nschedules: sets num_schedules to the specified value.
  • --nthreads: sets num_threads to the specified value. A value of '0' means is interpreted to set the number of threads to the maximum number of hardware threads on the system. The maximum number is determined by TBB and typically is the same number as returned by calling 'nproc' or 'getconf _NPROCESSORS_ONLN' at the command line. Also, note that adjusting the number of threads in a batch context takes place after this maximum number of threads is determined.
  • -j,--parallelism: sets both num_schedules and num_threads to the value specified. The interpretation of '0' as described for the --nthreads option also applies here. It is an error to specify this option with either --nschedules or --nthreads.

Using any of the above program options overwrites any previously specified values for num_schedules and num_threads.

The following table gives command-line examples and what the corresponding values of num_schedules and num_threads become.

Command num_schedules num_threads Notes
art -c <config> ... 1 1
art -c <config> --nthreads 4 ... 1 4
art -c <config> --nschedules 2 --nthreads 4 ... 2 4
art -c <config> -j 4 ... 4 4
art -c <config> -j 0 ... nproc nproc The value of nproc is the smaller of:
 (a) the value TBB determines as the maximum number of threads, or
 (b) number of requested CPUs if in an HTCondor batch job.

 Additional materials

The following links refer to presentations that have been given in various forums regarding multi-threaded design and considerations. Although the overall picture described in the presentations is consistent with the current implementation, specific details (e.g. code examples) may be different compared to what has been implemented. Please email for any clarifications.