- Table of contents
- Event-level products and input-file concatenation
- Product specification and BranchIDs
- BranchID lists and process history
- BranchIDList consistency criterion
- BranchIDLists and dropping products
Event
-level products and input-file concatenation¶
As mentioned on the previous page, the given process in which an Event
product is created is important in determining whether two input files can be concatenated together. It is therefore important to describe how products are distinguished from one another in art
. Important ingredients in distinguishing products from one another are the BranchID
and the BranchIDList
, which is a collection of BranchID
objects. These concepts are discussed below.
Product specification and BranchID
s¶
Event
-level products are declared by the statement
produces< MyProduct >("optionalInstance");
in a module constructor. In addition to the specified template argument (MyProduct
) and the instance name, a product is also specified by the name of the process by which it was created, and the module label corresponding to the particular module instance in which the produces
call is made. A product is thus fully specified by four pieces of information:
- Class name
- Module label
- Instance name
- Process name
With these four attributes, art
creates an ID number that uniquely identifies each specified product. This number, refered to as a BranchID
allows art
to:
- Persist products to ROOT output files
- Store and retrieve products on disk
- Assess compatibility between ROOT input files
If, for two specified products, any of the four attributes differ, then the assigned BranchID
s are also different and either product is treated as unique from the other.
This is intuitive for the following cases:
Case 1: Different class types¶
produces< MyProduct1 >(); // Class name corresponding to 'MyProduct1'
produces< MyProduct2 >(); // differs from that corresponding to 'MyProduct2'
Case 2: Different instance names¶
produces< MyProduct >(); // Empty instance name: ""
produces< MyProduct >("i1");
Case 3: Different module labels¶
It is also intuitive for the case where multiple module instances are included in a trigger path (i.e.):
physics : {
producers : {
prod1: {
module_type : MyProducer
settings : @local::myproducer.settings1
}
prod2: {
module_type : MyProducer
settings : @local::myproducer.settings2
}
}
p1: [prod1, prod2]
}
Clearly, the products created by the two instances of MyProducer
must be kept separate as the method of producing the products is presumably influenced by the configuration parameters the module requires.
This has a consequence, however, that may not be appreciated. Suppose a user executes two art
processes with the following configuration:
Configuration for process A | Configuration for process B |
|
|
Note that for these two configurations, the only difference is the specified module label--prodA
vs. prodB
. Even though the data associated with the products are identical, the product specifications are different, and they thus have different BranchID
s.
Case 4: Different process names¶
Another scenario corresponds to when the output from one process serves as input to another.
Suppose an experiment has created an EDFilter
module that is meant to be included in each process. In this case, we'll assume the experiment has created a customized event-counter module that can be included in user configurations through @table::
and @sequence::
commands. The module creates a product with various statistics information. A user decides to run two art processes in series, with the output from the first process serving as input to the second:
Configuration for process 1 | Configuration for process 2 |
|
|
For this scenario, the module label, product instance names, and product class names corresponding to the event counter are identical. The only attribute that distinguishes the event-counter data in one process from the data in the second is the process name. Hence, the process name is an important ingredient in the BranchID
number calculation.
The potentially unforeseen consequence is that if two configurations are identical except for the process name, all products created for both configurations are treated as independent of each other.
BranchID
lists and process history¶
For each process, art
keeps track of the BranchID
s corresponding to all declared products (via produces
) belonging to modules that are included on a trigger path. This set of BranchID
s, called a BranchIDList
, is stored in memory and persisted to any output files. In addition, the BranchIDList
s from all previous processes are stored in the ROOT output files.
For example, consider the following diagram:
The final output file out_3.root
was produced through a chain of art
processes, where the output from one process served as input to the next. The symbols b1
and b2
correspond to the BranchID
s produced in process 1 (p1
). The BranchIDList
s are persisted to all subsequent output files and placed in a master list of BranchIDList
objects in the same order as the chronological process history.
Consider an alternative process history where processes 1 and 2 are reversed. The diagram would look like:
Although the union of all BranchID
s in out_3.root
is the same in both process histories, art
treats the BranchIDList
order to be a meaningful representation of what is intended by the user. In this regard, the two different out_3.root
files would be considered to have incompatible process histories. They are examples of files that could not be concatenated together. The specific criterion is discussed below.
Filtering and TriggerResults
¶
Whenever a producer or filter is included on a trigger path, a TriggerResults
product is automatically created and inserted by art
, with its corresponding BranchID
added to the BranchID
list for the current process1. The product is written to the art
ROOT output files so that the TriggerResults
information can be used further downstream. It may, therefore, come as a surprise that even though users do not explicitly create any products with the producers or filters included in a trigger path for a given process, a TriggerResults
product (and, therefore, a art::TriggerResults_TriggerResults__<process name>
Event
branch) is written to the output file.
If users are not aware of this, they may be surprised to learn that a file produced with one configuration cannot be concatenated with a file produced using the same configuration plus an extra filtering stage.
1 The TriggerResults
product is a crucial aspect of the SelectEvents
facility supported for output streams and analyzers.
Empty BranchIDList
s¶
A BranchIDList
is always added to the list of BranchIDList
s in a file, even if the particular output file was produced with a process that did not explicitly create any products and filtered no events. For such a process, the BranchIDList
that is appended is empty. The empty list is included in the set of BranchIDList
s that are persisted in the output file.
BranchIDList
consistency criterion¶
Having described the BranchIDList
formation process, we can now more precisely define the BranchIDList
consistency criterion, which was stated to be:
Each
Event
-level data product (includingTriggerResult
, automatically inserted byart
for filters and producers) must have been produced in the same process for each input file, as determined by comparing theBranchIDLists
for each file.
Stated more specifically, the consistency criterion is:
With the exception of the first input file, each BranchIDList in a file must be identical to the corresponding BranchIDList in the BranchIDList registry, which is the ordered union of the BranchIDList s of all previous input files and that of the current process. |
The procedure is as follows:
- The
BranchIDList
registry is seeded with the ordered list ofBranchIDList
objects from the first input file. - The
BranchIDList
of the current process, which is empty if no products are produced and no events are filtered, is appended to the end of the current registry. - The
BranchIDList
s from each subsequent input file are compared list-by-list to those in theBranchIDList
registry. - Assuming the lists are identical for the element indices that are common, then the input file can be concatenated with the previous one.
- If a subsequent input file has more
BranchIDList
s that those in the registry, the additional lists are appended to the registry if the other lists are identical.
In what follows, we illustrate how the above procedure is carried out under various circumstances. In each of the examples, an art
process ("current process") reads in three input files: a.root
, b.root
, and c.root
. For some of the examples, the current process also creates an additional product. Each scenario shows how the BranchIDList
consistency criterion is met (allowing concatenation) or violated (resulting in a thrown exception).
Examples that satisfy the criterion¶
Example 1: No new products¶
Consider this diagram:
The illustrated sequence of events is:
- The
BranchIDList
s froma.root
seed theBranchIDList
registry. - Since the current process does not produce any new products, an empty
BranchIDList
is appended to the registry - Upon opening
b.root
, theBranchIDList
s are compared forp1
top3
. - Since the
p1
,p2
, andp3
BranchIDList
s are identical inb.root
and in the registry, thenb.root
can be concatenated witha.root
. - The
p4
BranchIDList
inb.root
is appended to the registry. - The third input file
c.root
contains only oneBranchIDList
, which is identical with the firstBranchIDList
in the registry--c.root
can therefore be concatenated with the previous two files.
Example 2: New products from current process¶
For this scenario:
- The
BranchIDList
s froma.root
seed theBranchIDList
registry. - The current process does produce a new product, so the corresponding
BranchIDList
is added to the registry - The
BranchIDList
s from bothb.root
andc.root
are identical with their counterparts in theBranchIDList
registry--they can thus be concatenated witha.root
.
Example 3: New products from new input file¶
This situation is an interesting variant on the first example.
- The
BranchIDList
s froma.root
seed theBranchIDList
registry. - The current process does not produce any new products, so an empty
BrannchIDList
is appended to the registry - Since the
p1
,p2
, andp3
BranchIDList
s are identical inb.root
and in the registry, thenb.root
can be concatenated witha.root
. - The
p4
BranchIDList
inb.root
is appended to the registry, allowing a new product to be accessed in the process2. - The third input file
c.root
contains only oneBranchIDList
, which is identical with the firstBranchIDList
in the registry--c.root
can therefore be concatenated with the previous two files.
2 Although the BranchIDList
comparisons are successful in step 3, an extra restriction exists in art versions older than 1.17.00 so that adding a new product in this way is not supported. This restriction has been lifted for versions 1.17.00 and newer, thus supporting new products in this manner.
Examples that violate the criterion¶
Example 1: New products from new file¶
For this example:
- Similar reasoning as above indicates that
b.root
can be concatenated witha.root
as theBranchIDList
s are consistent. - However, whenever
c.root
is read, itsp3
BranchIDList
includes three entries, whereas the corresponding list in the registry is empty. - The
p3
lists do not agree, therefore an exception is thrown, andart
attempts a graceful shutdown.
Example 2: New products from current process¶
For this scenario, the reasoning is similar to the previous example:
- In this process, another product is produced (
b9
), and therefore thep3
registry entry has one element. - Because the
p1
andp2
lists inb.root
match those in the registry,b.root
may be concatenated witha.root
. - When reading
c.root
, itsp3
list does not agree with the registry'sp3
BranchIDList
. - An exception is thrown, and
art
attempts a graceful shutdown.
BranchIDList
s and dropping products¶
The BranchIDList
is as an immutable object. Even introducing "drop"
directives like:
source.inputCommands: [ "drop *_*_myInstance_process1 InEvent" ]
outputs.out1.outputCommands: [ "drop *_*_myInstance_process2 InEvent" ]
do not modify in any way the set of ordered BranchIDList
s either in a particular file (via outputCommands
), or in the current process (via inputCommands
). The reason for this restriction has to do with the particular implementation of how art::Ptr
s are dereferenced. Removing such a restriction could, in principle, have disastrous consequences with how art
currently works.
To be sure, the "drop"
directives do prevent products from being written to a file or from being considered in the current process3. However, a record of their creation is preserved in the BranchIDList
, solely for the purpose of ensuring the correct behavior of an art::Ptr
.
The end result is that, with some exceptions, dropping products, in general, does not affect whether two files can be concatenated.
3 For inputCommands: [ "drop ..." ]
, not only are the products specified dropped from consideration in the process, but any products that were made depending on the dropped products are also removed from consideration in the process. This is the default behavior--products that were produced via a dropped product can be retained by adjusting a job-configuration parameter. To be sure, the input file is not modified in any way--i.e. the products are not removed from the input file. For the sake of the current process, however, any dropped products, including the products that depend on them, are inaccessible. In addition, no products (and their descendants) dropped on input can be persisted to subsequent output streams.