RangeSet merging is very slow for concatenated files
Tingjun from DUNE reports that reading in
art/ROOT files produces with version 2.03.00 takes hours when an input file is read, where the input file was a result of a concatenation job from 400 input files.
See attached file.
Fix exorbitant time taken to merge/collapse RangeSet information (resolves issue #13765).
The previous implementation had accommodated for situations that were
not to be encouraged--e.g. namely the following set of EventRanges:
In the previous implementation, such a set of EventRanges is sorted
(by the weak-ordering criterion), and it would collapse to one
EventRange, namely [1,11).
However, the above situation is ill-defined in the art context since
the collection of events corresponding to a given product/auxiliary
must be unique. In other words, a RangeSet is not meant to contain
duplicate events. It is permissible for two separate RangeSets to
contain duplicate events since they are separate entities. But the
previous situation was trying to support a situation where a RangeSet
was internally inconsistent.
#2 Updated by Kyle Knoepfel almost 4 years ago
- Status changed from Assigned to Resolved
- % Done changed from 0 to 100
The bottleneck was due to an implementation that was trying to accommodate a broader concept than what the
RangeSet should have supported. By clarifying the intent of the
RangeSet object, and by some function refactoring, reading in the file takes on the order of few seconds (1.8 sec. for us using the profile build where we do nothing but read in the file).