Project

General

Profile

Example of product aggregation:
POT summary information

A real use case where product aggregation is of benefit was raised by the NOvA experiment. NOvA's workflow is such that SubRuns are atomic across files -- i.e. a SubRun is not split up across multiple art/ROOT files. However, as is common in various experiments, Runs are split up across files. NOvA has a SubRun product that corresponds to the number of protons on target (POT summary). Since SubRuns are not split up across files (in NOvA's case), all product retrievals of the POT summary give meaningful information. However, creating a POT-summary Run product would be problematic since current versions of art (before 2.01.00) cannot easily present a Run-product value that corresponds to a complete, processed Run.

With 2.01.00, the mechanisms are now in place to be able to present to the user a POT summary that corresponds to the complete, processed Run, even if split across many art/ROOT files1. See below for an illustration.

1 The same functionality exists in 2.01.00 for SubRun products where the SubRun spans multiple art/ROOT files.
 

Process 1

In the first process, four art processes are executed independently of each other, with distinct sub-run numbers, but the same run number. For each process, a run product is created that corresponds to the total number of protons on target (POTSummary) seen for the processed span of events. For each process, there is one output file, to which the POTSummary run product is written. This can be depicted with this diagram:

In the image above, the POTSummary run products are labeled with the corresponding run and subrun numbers (r1, sr[0-3]) to emphasize that the products were produced for a given set of events/subruns.

Process 2

The second process is a concatenation process, where the output files from process 1 ("p[0-3].root"), are concatenated into a smaller set of output files ("q01.root" and "q23.root"). This is depicted as:

The products from the input files have been carried forward to the output files, along with the information that identifies to which span of events each product corresponds. In other words, both q*.root output files now have two instances of a run-1 product.

Process 3

For the third process, the q*.root files from process 2 are concatenated, again carrying forward the POTSummary run products. Upon reading q01.root, art determines that there are two POTSummary products associated with run 1. In versions of art before art 2.01.00, only the first product would have been retained. For versions 2.01.00 and newer, both products will be retained, and since their ranges-of-validity are disjoint, they will be aggregated according to the behavior described here, and their ranges of validity will be combined. Since for each input-file read, the POTSummary products from the open input file are aggregated, the resulting output file (r0123.root) will have two copies of the POTSummary: one corresponding to the aggregation from q01.root, and then one corresponding to the aggregation from q23.root.

Process 4

As in process 3, when reading r0123.root, two POTSummary products are detected, and since their ranges-of-validity are disjoint, the products and corresponding ranges are aggregated. The final output file (final.root) will then have one POTSummary Run product that corresponds to Run 1, SubRuns 0 through 3, having been aggregated according to the user-defined POTSummary::aggregate function.

N.B. It is not necessary to execute process 4 before performing a Run::getByLabel on the aggregated POTSummary product. It is sufficient just to have produced file "r0123.root", and then whenever that file is read and a user attempts to fetch a product, the aggregated product will be presented. The purpose of this illustration is to show that if a concatenation job is performed on just the "r0123.root" input file, the output file will contain only the aggregated product, and not the two separate products as in "r0123.root".

A caveat about complicated workflows

Whereas the workflow as shown in the NOvA example above is fairly simple, a complicated workflow can be more difficult to comprehend. For this discussion, a complicated workflow is one where a series of input files splits into a larger set of output files, at arbitrary boundaries, and then the larger set of files is condensed down into a smaller set. Specifically, the complication applies if:

  • A Run product corresponding to a fragment is produced, and the input file does not contain the full Run, and the output files are not configured to switch only at Run or InputFile boundaries, or
  • A SubRun product corresponding to a fragment is produced, and the input file does not contain the full SubRun, and the output files are not configured to switch only at SubRun, Run, or InputFile boundaries.

In each workflow we have envisaged, we have found a solution to correctly interpret the Run and SubRun products that correspond to a given fragment. However, be advised that complicated workflows can create some awkward product-handling scenarios. For example, consider the following workflow:

  • A user submits 1000 grid jobs each running an art process that creates one art/ROOT output file. In each output file is a Run product that represents an accumulation of data over the processed events for that job.
  • The output files from the previous process are submitted in 1000 grid jobs again. This time new products are created and the Run product from the previous process is carried forward. In addition, output modules are permitted to switch to a new output file after some arbitrary event. However, upon switching to a new output file, there is no way to split the Run product from the previous process. So a decision must be made as to how that product is carried forward.
  • In order to avoid double counting, the carried-forward aggregated product is placed into only one output file for a given output module. This ensures that whenever the files are concatenated in a future process so that the entire Run can be seen, the product that is presented to the user corresponds to the full Run, and only the full Run--i.e. no double counting.

Remember, an aggregated product is presented to the user whenever the file that contains the product is read as input. In the situation above, it can happen that a file is being processed that represents only a portion of the RangeSet that was assigned to a (Sub)Run product carried forward from an earlier process. In such a situation, an attempt to retrieve the product is ill-defined--in other words, a product could be presented that represents a range-of-validity that is greater than what is actually represented by the currently open input file. The solution to this complication is to:

  1. Make sure all files corresponding to a Run-fragment have been concatenated before you retrieve the product corresponding to the fragment, or
  2. Use the SummedValue<T> auxiliary class (see here) to update the value of the quantity of interest as each input file is processed.