Project

General

Profile

New IDs, or one run and one subrun

As it is likely that the (sub)runs contained by the individual datasets are distinct, randomly sampling events according to the dataset probabilities would, in principle, generate significant framework transitions that are not desired. To avoid the undesirable thrashing of beginRun, beginSubRun, endSubRun, and endRun calls, any job that uses the SamplingInput source will have only one run and one subrun. However, the original run and subrun numbers (and event numbers) are still available. For this section, we focus on the remapped event IDs and how one can access the original ones.

Even though each dataset is sampled randomly according to the specified weights, the event IDs presented to the user are monotonically increasing, based on the configured numbers corresponding to the run, subrun, and first event. Suppose the input files contained the following events:

File Events
signal.root Run 14, SubRun 3, Events [9, 19)*
bkgd.root Run 4, SubRun 0, Events [1, 6)
Run 4, SubRun 1, Events [1, 6)

*The [n, m) notation indicates a half-open range of contiguous numbers, including 'n' but excluding 'm'.

then based on the above configuration, the events presented to the user could be:

New event ID Dataset Original event ID
1:0:1 bkgd 4:0:1
1:0:2 bkgd 4:0:2
1:0:3 bkgd 4:0:3
1:0:4 bkgd 4:0:4
1:0:5 signal 14:3:11
1:0:6 bkgd 4:0:5
1:0:7 bkgd 4:1:1
1:0:8 bkgd 4:1:2
1:0:9 bkgd 4:1:3
1:0:10 bkgd 4:1:4
1:0:11 bkgd 4:1:5
1:0:12 signal 14:3:12

where the triplet 'i:j:k' means an event with run i, subrun j, and event k. See the next session for how to gain access to the original event ID.

Sampled event information

The framework provides the art::SampledEventInfo event data product that gives the user access to the original event ID and the dataset. For example:

void MyAnalyzer::analyze(art::Event const& e) override
{
  auto const& new_id = e.id();  // EventID{1, 0, 1} for first event above

  // Retrieve art-provided product
  auto const& sampled_info = *e.getValidHandle<art::SampledEventInfo>("SamplingInput");

  auto const& original_id = sampled_info.id  // EventID{4, 0, 1} for first event
  auto const& sampled_dataset = sampled_info.dataset;  // "bkgd" for first event
  double const weight = sampled_info.weight;           // 0.82  for first event
  double const prob = sampled_info.probability;        // 0.700855 for first event
}

Sampled run and subrun information

As there is only one run and one subrun for a job that uses the SamplingInput source, but it is still necessary to access the original run and subrun information from the input datasets, the mechanisms for accessing the original run(s) and subrun(s) are more involved. The framework provides two data products that can be used to determine the (sub)run information contained in the files:

  • art::SampledRunInfo, a type alias to std::map<std::string, SampledInfo<RunID>>, and
  • art::SampledSubRunInfo, a type alias to std::map<std::string, SampledInfo<SubRunID>>

where the key to the map is the dataset name ('signal' or 'bkgd' for the above configuration). The SampledInfo<RunID> class contains the following data members:

  • ids, a std::vector<RunID> corresponding to all RunIDs present in the input file
  • weight, a double whose value is the configured weight for the dataset in question
  • probability, a double whose value is the normalized weight (or sampling probability) for the dataset in question

The representation of SampledInfo<SubRunID> is identical except that the ids member is of type std::vector<SubRunID>, corresponding to all SubRunIDs present in the input file.

For the job configuration above, the SampledRunInfo and SampledSubRunInfo objects can be accessed in the following way:

void MyAnalyzer::beginRun(Run const& r) override
{
  auto const id = r.id();   // RunID{1};

  auto const& sampled_info = *r.getValidHandle<art::SampledRunInfo>("SamplingInput");
  auto const& signal_info = sampled_info.at("signal");
  auto const& bkgd_info = sampled_info.at("bkgd");

  double const signal_weight = signal_info.weight  // 0.35
  double const bkgd_weight = bkgd_info.weight      // 0.82

  double const signal_prob = signal_info.probability // 0.299145
  double const bkgd_prob = bkgd_info.probability     // 0.700855

  assert(signal_info.ids.size() == 1);
  assert(signal_info.ids[0] == RunID{14});

  assert(bkgd_info.ids.size() == 1);
  assert(bkgd_info.ids[0] == RunID{4});
}

void MyAnalyzer::beginSubRun(SubRun const& sr) override
{
  auto const id = sr.id();  // SubRunID{1, 0};

  auto const& sampled_info = *sr.getValidHandle<art::SampledSubRunInfo>("SamplingInput");
  auto const& signal_info = sampled_info.at("signal");
  auto const& bkgd_info = sampled_info.at("bkgd");

  double const signal_weight = signal_info.weight  // 0.35
  double const bkgd_weight = bkgd_info.weight  // 0.82

  assert(signal_info.ids.size() == 1);
  assert((signal_info.ids[0] == SubRunID{14, 3}));

  assert(bkgd_info.ids.size() == 2);
  assert((bkgd_info.ids[0] == SubRunID{4, 0}));
  assert((bkgd_info.ids[1] == SubRunID{4, 1}));
}

Note that MyAnalyzer's beginRun and beginSubRun overrides will be called only once for the art job.