'Shuffling' input module
I have several use cases for a new (to my knowledge) kind of input module that takes several art collections as input, and reads them by taking 1 event at a time from a given collection, chosen randomly, but according to pre-defined fractions. The net effect would be of 'shuffling' several collections into one, so that the sequential event stream is a random intermingled combination of the inputs. The use cases for this that I know of are:
- Creating a 'physics ensemble' of events generated by different physics processes, in simulation of a real experiment, as a means of implementing a Mock Data challenge sample.
- Merging two simulated event collections that were generated according to different underlying distributions, so that the result behaves as a sample generated by the union of the two distributions. This allows creating new simulation samples by augmenting existing samples instead of replacing them.
Subrun information would be somewhat compromised by this process, as (for instance) the procedure could not guarantee that the input collections were all completely read. Policy for what to do when 1 collection reaches its end would have to be defined. These issues were discussed with art developers, who proposed solutions.
#10 Updated by Kyle Knoepfel 11 months ago
This feature has been implemented for event products in commit art:16e65f5. The source is called the
SamplingInput source, which takes events from separate datasets based on weights specified in the configuration. In addition, an
art::SampledEventID data product has been created that includes not only the sampled
EventID, but also the dataset from which it comes, the specified weight associated with the dataset, and the probability of sampling from that dataset given the weight relative to the sum of all weights.
The remaining steps relate to handling
SubRun products (which is nontrivial), and fleshing out the configuration options for the input source.