Project

General

Profile

Feature #7005

Improve performance of FindOne / FindMany construction.

Added by Christopher Green about 6 years ago. Updated almost 6 years ago.

Status:
Closed
Priority:
Urgent
Category:
I/O
Target version:
Start date:
09/12/2014
Due date:
% Done:

100%

Estimated time:
Spent time:
Scope:
Internal
Experiment:
LArSoft
SSI Package:
art
Duration:

Description

Per informal reports from LArSoft / MicroBooNE, FindMany and friends are unacceptably slow for large association collections.

There are two algorithms in operation for constructing the data within these smart query objects: the one called in most use cases has a nested loop, and may be amenable to improvement.


Related issues

Related to art - Support #7298: FindManyP does not maintain the order of associated objectsClosed11/06/2014

History

#1 Updated by Christopher Green about 6 years ago

  • Status changed from Assigned to Resolved
  • % Done changed from 0 to 100

Of the two algorithms, one of them (not the slow one) made assumptions that narrowed its applicability beyond the ability of the automatic algorithm selection system to detect: there were therefore cases (which we don't believe have been triggered in real life) where the algorithm would silently produce the wrong answer.

In order to test the slow algorithm, I created a pair of classes, arttest::Track and arttest::Hit, both of which being structs containing an ID only. I then wrote a module source:test/Integration/FindManySpeedTestProducer_module.cc, which creates n tracks and randomly associates a Poisson mean of m hits to each track from a pool of 100K hits. I ran tests with the old and a new algorithm, three trials per generated test file, and the results were as follows (couplets are CPU time and real time, respectively):

  Trial          Old                New         

1000-3000 (15.5666, 15.573)  (1.16482, 1.1664)  
          (15.4327, 15.4402) (1.13283, 1.13343) 
          (16.2355, 16.2431) (1.13483, 1.13558) 

3000-3000 (137.725, 137.789) (10.0525, 10.0616) 
          (167.467, 167.541) (3.49847, 3.50282) 
          (149.809, 149.871) (3.3065, 3.30913)  

1000-9000 (46.8009, 46.8236) (3.72843, 3.73299) 
          (47.1158, 47.1385) (3.57146, 3.57676) 
          (57.3653, 57.3899) (3.78742, 3.79325) 
For each trial label, the first number is the number of tracks; the second is the Poisson mean number of hits associated to each track. The times are all in seconds.

Discounting the outlier in the first run of the new algorithm for 3000-3000 (I did not have exclusive use of the machine), one can see that the old algorithm takes significantly longer than the new algorithm, and is additionally linear in the product of tracks and hits-per-track. The new algorithm is by contrast approximately linear in the number of associations.

While making these changes, the new second construction algorithm was made significantly more general to cover all the cases for which the other algorithm was (sometimes erroneously) selected in addition to the ones covered by the older, slow algorithm. It also covers more cases besides. In addition to art::Handle, art::View and collections of art::Ptr, an art::Find{One,Many}{,P} can now be constructed with a collection of pointer (or more usually, const pointer) as its reference collection. The interface of art::View was expanded in order to allow this new functionality.

The improved code and expanded functionality has been provided with 4da91b70efde2bfeb08733886d670ad536128ea4.

#2 Updated by Wesley Ketchum almost 6 years ago

Can you show the results for the flipped cases? 3000-1000 and 9000-1000. That was the case that was most painful to me: having my large hit collection, and then wanting the smaller associated track collection.

#3 Updated by Christopher Green almost 6 years ago

Wes, it is important to note that in all of these tests, the hit collection is the same size: 100K. The two numbers represent the number of tracks in the event, and the number of hits per track.

That said, here are the results you requested:

  Trial          Old                New         

3000-1000 (46.7119, 46.7319) (1.12283, 1.12325)
          (45.7171, 45.7351) (1.14982, 1.15148)
          (46.7419, 46.7616) (1.10583, 1.10684)

9000-1000 (416.521, 416.685) (3.55046, 3.55508)
          (432.442, 432.649) (3.55546, 3.56075)
          (422.974, 423.14)  (3.89441, 3.89907)
As expected based on the previous results with respect to the "baseline" 3000-3000, the old algorithm is a factor 3 faster than the old algorithm baseline for 3000-1000, and a factor 3 slower for 9000-1000; the new algorithm is a factor 3 faster than the new algorithm baseline for 3000-1000, and a factor 3 slower for 9000-1000. The difference in complexity between the two algorithms can be shown by comparing 1000-3000 and 3000-1000. The old algorithm is a factor 3 slower in the latter case versus the former; the new algorithm takes the same amount of time. Comparing the old and new algorithms directly, in the 9000-1000 case the new algorithm is faster than the old by a factor >100.

#4 Updated by Christopher Green almost 6 years ago

  • Status changed from Resolved to Closed


Also available in: Atom PDF