Project

General

Profile

Feature #21229

'Shuffling' input module

Added by David Brown about 1 year ago. Updated 10 months ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
Infrastructure
Target version:
Start date:
11/27/2018
Due date:
% Done:

100%

Estimated time:
(Total: 62.00 h)
Spent time:
(Total: 101.00 h)
Scope:
Internal
Experiment:
Mu2e
SSI Package:
art
Duration:

Description

I have several use cases for a new (to my knowledge) kind of input module that takes several art collections as input, and reads them by taking 1 event at a time from a given collection, chosen randomly, but according to pre-defined fractions. The net effect would be of 'shuffling' several collections into one, so that the sequential event stream is a random intermingled combination of the inputs. The use cases for this that I know of are:
- Creating a 'physics ensemble' of events generated by different physics processes, in simulation of a real experiment, as a means of implementing a Mock Data challenge sample.
- Merging two simulated event collections that were generated according to different underlying distributions, so that the result behaves as a sample generated by the union of the two distributions. This allows creating new simulation samples by augmenting existing samples instead of replacing them.
Subrun information would be somewhat compromised by this process, as (for instance) the procedure could not guarantee that the input collections were all completely read. Policy for what to do when 1 collection reaches its end would have to be defined. These issues were discussed with art developers, who proposed solutions.


Subtasks

Feature #21430: Determine allowed configuration for event-shuffling input sourceClosedKyle Knoepfel

Feature #21431: Create random-sampling mechanism for reading eventsClosedKyle Knoepfel

Feature #21432: Develop product that encapsulates (Sub)Run products from input datasetsClosedKyle Knoepfel

Feature #21433: Aggregate (Sub)Run products for event-shuffling inputClosedKyle Knoepfel

Feature #21434: Ensure graceful job completion when one dataset is exhaustedClosedKyle Knoepfel

Feature #21452: Enable delayed reading for SamplingInput eventsClosedKyle Knoepfel

Feature #21715: Determine product lookup policy for sampled productsClosedKyle Knoepfel

History

#1 Updated by Kyle Knoepfel about 1 year ago

  • Status changed from New to Accepted

#2 Updated by Kyle Knoepfel 12 months ago

  • Target version set to 2.12.00

#3 Updated by Kyle Knoepfel 12 months ago

  • Category set to Infrastructure
  • Status changed from Accepted to Assigned
  • Assignee set to Kyle Knoepfel

#4 Updated by Kyle Knoepfel 12 months ago

  • Due date set to 11/27/2018
  • Start date changed from 10/25/2018 to 11/27/2018

due to changes in a related task: #21430

#5 Updated by Kyle Knoepfel 12 months ago

  • Due date set to 11/27/2018

due to changes in a related task: #21431

#6 Updated by Kyle Knoepfel 12 months ago

  • Due date set to 11/27/2018

due to changes in a related task: #21432

#7 Updated by Kyle Knoepfel 12 months ago

  • Due date set to 11/27/2018

due to changes in a related task: #21433

#8 Updated by Kyle Knoepfel 12 months ago

  • Due date set to 11/27/2018

due to changes in a related task: #21434

#9 Updated by Kyle Knoepfel 12 months ago

  • Due date set to 11/28/2018

due to changes in a related task: #21452

#10 Updated by Kyle Knoepfel 11 months ago

This feature has been implemented for event products in commit art:16e65f5. The source is called the SamplingInput source, which takes events from separate datasets based on weights specified in the configuration. In addition, an art::SampledEventID data product has been created that includes not only the sampled EventID, but also the dataset from which it comes, the specified weight associated with the dataset, and the probability of sampling from that dataset given the weight relative to the sum of all weights.

The remaining steps relate to handling Run and SubRun products (which is nontrivial), and fleshing out the configuration options for the input source.

#11 Updated by Kyle Knoepfel 10 months ago

  • Due date set to 01/17/2019

due to changes in a related task: #21715

#12 Updated by Kyle Knoepfel 10 months ago

  • Status changed from Assigned to Resolved

#13 Updated by Kyle Knoepfel 10 months ago

  • Status changed from Resolved to Closed


Also available in: Atom PDF