- Table of contents
- Multithreaded framework basics
Multithreaded framework basics¶
As of version 3.00.00, the art framework supports concurrent processing of events. The eventual goal is to be able to concurrently process events from different
SubRuns, as illustrated here (time goes to the right):
The current implementation serializes process at
SubRun boundaries so that only events within a
SubRun are process in parallel:
In the above images, the number of processing loops is 3--these are the number of schedules, which are user-configurable. After the last event of a
SubRun has been processed, the number of processing loops is reduced to 1, and then once the first event of a new
SubRun is ready for processing, the number of schedules increases to the user-specified number.
The overall multi-threaded design is based off of CMSSW's choices and experiences:
- The thread-scheduling technology is Intel’s Threading Building Blocks (TBB)
- The multi-threaded steps to be performed are factorized into tasks (a user's event-level module member function can be thought of as a task)
- Users are allowed to specify the number of concurrent event loops (i.e. schedules) and the maximum number of threads that the process can use
- Users do not explicitly create threads themselves
- Users are allowed to call TBB-provided parallel algorithms in their own code (see Parallelism in user code for guidance)
art guarantees the following behavior:
- Processing of an event happens on one and only one schedule (see Schedules and transitions).
- For a given trigger path, modules are processed in the order specified.
- A module shared among paths will be processed only once per event.
- Product insertion into the event is thread-safe.
- Product retrieval from the event is thread-safe.
- Provenance retrieval from the event is thread-safe.
- All modules and services provided by art are thread-safe.
TFileService, the user is required to specify additional serialization (see here).
Opting in to multi-threaded processing¶
Multi-threaded event processing is not automatically enabled. In order to benefit from it, users must enable the scheduler to run with more than 1 schedule and/or more than 1 thread. In addition, the libraries and modules they use should be implemented in a way to support multi-threaded execution. We discuss below how to configure the scheduler. Structuring your code to support multi-threading is discussed on the Module threading types page.
art's TBB scheduler is initialized based on parameters that the user is allowed to specify in the
'services.scheduler' table of his/her configuration1:
num_threads: Maximum number of threads TBB is allowed to use when executing its tasks--the default value is 1 thread. For HTCondor batch jobs, if the specified number of threads exceeds the requested number of CPUs, then
num_threadswill be set to the requested number of CPUs. Please see here for details.
num_schedules: Number of events to concurrently process at the same time--the default value is 1.
stack_size: The stack size (in bytes) that the TBB scheduler will use for its threads--the default is 10 MB, which closely approximates the stack size of the main thread2.
Explicitly specified values for
num_schedules will be overwritten if any of the below program options are used.
1 For users who would like to invoke TBB algorithms within their own code or use ROOT's implicit multi-threading facilities, please see the guidance here.
2 The default stack size TBB specifies is 1 MB, which can be inadequate for various workflows.
art provides program options to set the
num_threads configurations from the command line:
num_schedulesto the specified value.
num_threadsto the specified value. A value of
'0'means is interpreted to set the number of threads to the maximum number of hardware threads on the system. The maximum number is determined by TBB and typically is the same number as returned by calling
'getconf _NPROCESSORS_ONLN'at the command line. Also, note that adjusting the number of threads in a batch context takes place after this maximum number of threads is determined.
-j,--parallelism: sets both
num_threadsto the value specified. The interpretation of
'0'as described for the
--nthreadsoption also applies here. It is an error to specify this option with either
Using any of the above program options overwrites any previously specified values for
The following table gives command-line examples and what the corresponding values of
||nproc||nproc||The value of nproc is the smaller of:
(a) the value TBB determines as the maximum number of threads, or
(b) number of requested CPUs if in an HTCondor batch job.
The following links refer to presentations that have been given in various forums regarding multi-threaded design and considerations. Although the overall picture described in the presentations is consistent with the current implementation, specific details (e.g. code examples) may be different compared to what has been implemented. Please email email@example.com for any clarifications.