Would like ability to read both data and MC files in the same job
I am working on an analysis for NOvA where I would like to create a set of data products using a ResultsProducer for a variety of files produced in different conditions. The basic distinction in the production is whether the file is from data or simulation. The end result of what I want to store is the same type of data product from each type of file, data or MC.
I want to be able to do this so that I can do a fit where I compare the distribution of quantities from the data to a similar distribution for the MC. I would like to do that without resorting to having to create a TTree outside of the framework and losing provenance information about what went into the product.
If I simply try to read in a data and MC file in the same job, I get the following exception thrown:
---- MismatchedInputFiles BEGIN
Cannot merge file 'numu_skimmed_prod_pid_S15-05-22_fd_numi.root' due to inconsistent process histories:
I suspect other experiments would like to do something similar when they get to that point. Without such a facility, we are encouraging people to create a bunch of n-tuples and do their analysis only with those and outside of the framework.
#2 Updated by Kyle Knoepfel almost 4 years ago
- Status changed from Accepted to Assigned
- Assignee set to Kyle Knoepfel
- Estimated time set to 4.00 h
One of the reasons
art does not allow for inconsistent process histories is that to do so would cause ambiguities for the implementation of resolving
art:Ptrs. In this particular case, however, it appears that you are interested specifically in
Results products. If that is true, it is (we think) fairly straightforward to allow a mismatch in process histories as long as there are no events in the files. In other words, the procedure would look like:
- Assemble the
Resultsproducts from processing events, sub-runs, and runs.
- Write these
Resultsproducts to separate data and MC files, where both files have no events (achieved via '
dropAllEvents: true' in the output-module configuration).
- Read in the no-event input files, performing the manipulations you require for data and MC.
Until the stored metadata system within
art is redesigned, the above is likely the simplest approach. Please let us know if this would fit your need.
#4 Updated by Kyle Knoepfel almost 4 years ago
SubRun products are not relevant for merging, so they can be empty or filled without consequence. At first blush, the only change required to
art is to suppress the process-history checking for files with no events. Right now this check is performed for all files.
We are aiming to implement this feature for the next minor release of
art...assuming we encounter no hiccups.
#6 Updated by Kyle Knoepfel about 3 years ago
#7 Updated by Kyle Knoepfel about 3 years ago
- Category set to Metadata
- Status changed from Assigned to Resolved
- % Done changed from 0 to 100
- SSI Package art added
This issue has been resolved as a consequence of providing feature #16602. Although it will now be possible to read both data and MC files in the same job, note that art still requires event numbers to be unique. Therefore, processing a data event with the same
EventID value as a MC event will be an error.