Feature #11760
Would like ability to read both data and MC files in the same job
Description
I am working on an analysis for NOvA where I would like to create a set of data products using a ResultsProducer for a variety of files produced in different conditions. The basic distinction in the production is whether the file is from data or simulation. The end result of what I want to store is the same type of data product from each type of file, data or MC.
I want to be able to do this so that I can do a fit where I compare the distribution of quantities from the data to a similar distribution for the MC. I would like to do that without resorting to having to create a TTree outside of the framework and losing provenance information about what went into the product.
If I simply try to read in a data and MC file in the same job, I get the following exception thrown:
---- MismatchedInputFiles BEGIN
Cannot merge file 'numu_skimmed_prod_pid_S15-05-22_fd_numi.root' due to inconsistent process histories:
I suspect other experiments would like to do something similar when they get to that point. Without such a facility, we are encouraging people to create a bunch of n-tuples and do their analysis only with those and outside of the framework.
Related issues
History
#1 Updated by Kyle Knoepfel almost 5 years ago
- Status changed from New to Accepted
A full solution to this use case is a substantial piece of work, but we will get back to you for more detailed requirements for an interim solution.
#2 Updated by Kyle Knoepfel over 4 years ago
- Status changed from Accepted to Assigned
- Assignee set to Kyle Knoepfel
- Estimated time set to 4.00 h
One of the reasons art
does not allow for inconsistent process histories is that to do so would cause ambiguities for the implementation of resolving art:Ptr
s. In this particular case, however, it appears that you are interested specifically in Results
products. If that is true, it is (we think) fairly straightforward to allow a mismatch in process histories as long as there are no events in the files. In other words, the procedure would look like:
- Assemble the
Results
products from processing events, sub-runs, and runs. - Write these
Results
products to separate data and MC files, where both files have no events (achieved via 'dropAllEvents: true
' in the output-module configuration). - Read in the no-event input files, performing the manipulations you require for data and MC.
Until the stored metadata system within art
is redesigned, the above is likely the simplest approach. Please let us know if this would fit your need.
#3 Updated by Brian Rebel over 4 years ago
I think that process could work. I assume that would require there be no products in the Run or Subrun sections of the file either, is that correct?
Does the workflow you indicated rely on any change to art?
#4 Updated by Kyle Knoepfel over 4 years ago
Run
and SubRun
products are not relevant for merging, so they can be empty or filled without consequence. At first blush, the only change required to art
is to suppress the process-history checking for files with no events. Right now this check is performed for all files.
We are aiming to implement this feature for the next minor release of art
...assuming we encounter no hiccups.
#5 Updated by Kyle Knoepfel over 3 years ago
- Blocked by Feature #16602: Allow multiple files with inconsistent process histories added
#6 Updated by Kyle Knoepfel over 3 years ago
#7 Updated by Kyle Knoepfel over 3 years ago
- Category set to Metadata
- Status changed from Assigned to Resolved
- % Done changed from 0 to 100
- SSI Package art added
This issue has been resolved as a consequence of providing feature #16602. Although it will now be possible to read both data and MC files in the same job, note that art still requires event numbers to be unique. Therefore, processing a data event with the same EventID
value as a MC event will be an error.
#8 Updated by Kyle Knoepfel over 3 years ago
- Target version set to 2.08.00
#9 Updated by Kyle Knoepfel over 3 years ago
- Status changed from Resolved to Closed