Project

General

Profile

Feature #11760

Would like ability to read both data and MC files in the same job

Added by Brian Rebel over 4 years ago. Updated about 3 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
Metadata
Target version:
Start date:
02/18/2016
Due date:
% Done:

100%

Estimated time:
4.00 h
Spent time:
Scope:
Internal
Experiment:
NOvA
SSI Package:
art
Duration:

Description

I am working on an analysis for NOvA where I would like to create a set of data products using a ResultsProducer for a variety of files produced in different conditions. The basic distinction in the production is whether the file is from data or simulation. The end result of what I want to store is the same type of data product from each type of file, data or MC.

I want to be able to do this so that I can do a fit where I compare the distribution of quantities from the data to a similar distribution for the MC. I would like to do that without resorting to having to create a TTree outside of the framework and losing provenance information about what went into the product.

If I simply try to read in a data and MC file in the same job, I get the following exception thrown:

---- MismatchedInputFiles BEGIN
Cannot merge file 'numu_skimmed_prod_pid_S15-05-22_fd_numi.root' due to inconsistent process histories:

I suspect other experiments would like to do something similar when they get to that point. Without such a facility, we are encouraging people to create a bunch of n-tuples and do their analysis only with those and outside of the framework.


Related issues

Blocked by art - Feature #16602: Allow multiple files with inconsistent process historiesClosed05/19/2017

History

#1 Updated by Kyle Knoepfel over 4 years ago

  • Status changed from New to Accepted

A full solution to this use case is a substantial piece of work, but we will get back to you for more detailed requirements for an interim solution.

#2 Updated by Kyle Knoepfel almost 4 years ago

  • Status changed from Accepted to Assigned
  • Assignee set to Kyle Knoepfel
  • Estimated time set to 4.00 h

One of the reasons art does not allow for inconsistent process histories is that to do so would cause ambiguities for the implementation of resolving art:Ptrs. In this particular case, however, it appears that you are interested specifically in Results products. If that is true, it is (we think) fairly straightforward to allow a mismatch in process histories as long as there are no events in the files. In other words, the procedure would look like:

  1. Assemble the Results products from processing events, sub-runs, and runs.
  2. Write these Results products to separate data and MC files, where both files have no events (achieved via 'dropAllEvents: true' in the output-module configuration).
  3. Read in the no-event input files, performing the manipulations you require for data and MC.

Until the stored metadata system within art is redesigned, the above is likely the simplest approach. Please let us know if this would fit your need.

#3 Updated by Brian Rebel almost 4 years ago

I think that process could work. I assume that would require there be no products in the Run or Subrun sections of the file either, is that correct?

Does the workflow you indicated rely on any change to art?

#4 Updated by Kyle Knoepfel almost 4 years ago

Run and SubRun products are not relevant for merging, so they can be empty or filled without consequence. At first blush, the only change required to art is to suppress the process-history checking for files with no events. Right now this check is performed for all files.

We are aiming to implement this feature for the next minor release of art...assuming we encounter no hiccups.

#5 Updated by Kyle Knoepfel about 3 years ago

  • Blocked by Feature #16602: Allow multiple files with inconsistent process histories added

#6 Updated by Kyle Knoepfel about 3 years ago

It turns out that implementing this along the lines we had anticipated before was unfeasible. We are currently addressing issue #16602, the resolution of which would allow you to run over both data and MC, with none of the restrictions introduced in note #11760-2 above.

#7 Updated by Kyle Knoepfel about 3 years ago

  • Category set to Metadata
  • Status changed from Assigned to Resolved
  • % Done changed from 0 to 100
  • SSI Package art added

This issue has been resolved as a consequence of providing feature #16602. Although it will now be possible to read both data and MC files in the same job, note that art still requires event numbers to be unique. Therefore, processing a data event with the same EventID value as a MC event will be an error.

#8 Updated by Kyle Knoepfel about 3 years ago

  • Target version set to 2.08.00

#9 Updated by Kyle Knoepfel about 3 years ago

  • Status changed from Resolved to Closed


Also available in: Atom PDF