Project

General

Profile

Event-level products and input-file concatenation

As mentioned on the previous page, the given process in which an Event product is created is important in determining whether two input files can be concatenated together. It is therefore important to describe how products are distinguished from one another in art. Important ingredients in distinguishing products from one another are the BranchID and the BranchIDList, which is a collection of BranchID objects. These concepts are discussed below.

Product specification and BranchIDs

Event-level products are declared by the statement

produces< MyProduct >("optionalInstance");

in a module constructor. In addition to the specified template argument (MyProduct) and the instance name, a product is also specified by the name of the process by which it was created, and the module label corresponding to the particular module instance in which the produces call is made. A product is thus fully specified by four pieces of information:

  1. Class name
  2. Module label
  3. Instance name
  4. Process name

With these four attributes, art creates an ID number that uniquely identifies each specified product. This number, refered to as a BranchID allows art to:

  • Persist products to ROOT output files
  • Store and retrieve products on disk
  • Assess compatibility between ROOT input files

If, for two specified products, any of the four attributes differ, then the assigned BranchIDs are also different and either product is treated as unique from the other.

This is intuitive for the following cases:

Case 1: Different class types

produces< MyProduct1 >(); // Class name corresponding to 'MyProduct1'
produces< MyProduct2 >(); // differs from that corresponding to 'MyProduct2'

Case 2: Different instance names

produces< MyProduct >();     // Empty instance name: "" 
produces< MyProduct >("i1");

Case 3: Different module labels

It is also intuitive for the case where multiple module instances are included in a trigger path (i.e.):

physics : {

   producers : {
      prod1: { 
         module_type : MyProducer 
         settings : @local::myproducer.settings1
      }
      prod2: { 
         module_type : MyProducer 
         settings : @local::myproducer.settings2
      }
   }

   p1: [prod1, prod2]
}

Clearly, the products created by the two instances of MyProducer must be kept separate as the method of producing the products is presumably influenced by the configuration parameters the module requires.

This has a consequence, however, that may not be appreciated. Suppose a user executes two art processes with the following configuration:

Configuration for process A Configuration for process B
process_name : process
physics : {

   producers : {
      prodA: { 
         module_type : MyProducer 
         settings : @local::myproducer.settings
      }
   }

   p1: [prodA]
}

process_name : process
physics : {

   producers : {
      prodB: { 
         module_type : MyProducer 
         settings : @local::myproducer.settings
      }
   }

   p1: [prodB]
}

Note that for these two configurations, the only difference is the specified module label--prodA vs. prodB. Even though the data associated with the products are identical, the product specifications are different, and they thus have different BranchIDs.

Case 4: Different process names

Another scenario corresponds to when the output from one process serves as input to another.

Suppose an experiment has created an EDFilter module that is meant to be included in each process. In this case, we'll assume the experiment has created a customized event-counter module that can be included in user configurations through @table:: and @sequence:: commands. The module creates a product with various statistics information. A user decides to run two art processes in series, with the output from the first process serving as input to the second:

Configuration for process 1 Configuration for process 2

process_name: Process1
physics : {

   filters : {
      f1 : { ... }
      f2 : { ... }
      @table::event_counter
   }

   p1 : [f1, f2, @sequence::ec ]
   e1 : [out_stream]
}

process_name: Process2
physics : {

   filters : {
      f3 : { ... }
      f4 : { ... }
      @table::event_counter
   }

   p1 : [f3, f4, @sequence::ec ]

}

For this scenario, the module label, product instance names, and product class names corresponding to the event counter are identical. The only attribute that distinguishes the event-counter data in one process from the data in the second is the process name. Hence, the process name is an important ingredient in the BranchID number calculation.

The potentially unforeseen consequence is that if two configurations are identical except for the process name, all products created for both configurations are treated as independent of each other.


BranchID lists and process history

For each process, art keeps track of the BranchIDs corresponding to all declared products (via produces) belonging to modules that are included on a trigger path. This set of BranchIDs, called a BranchIDList, is stored in memory and persisted to any output files. In addition, the BranchIDLists from all previous processes are stored in the ROOT output files.

For example, consider the following diagram:

The final output file out_3.root was produced through a chain of art processes, where the output from one process served as input to the next. The symbols b1 and b2 correspond to the BranchIDs produced in process 1 (p1). The BranchIDLists are persisted to all subsequent output files and placed in a master list of BranchIDList objects in the same order as the chronological process history.

Consider an alternative process history where processes 1 and 2 are reversed. The diagram would look like:

Although the union of all BranchIDs in out_3.root is the same in both process histories, art treats the BranchIDList order to be a meaningful representation of what is intended by the user. In this regard, the two different out_3.root files would be considered to have incompatible process histories. They are examples of files that could not be concatenated together. The specific criterion is discussed below.

Filtering and TriggerResults

Whenever a producer or filter is included on a trigger path, a TriggerResults product is automatically created and inserted by art, with its corresponding BranchID added to the BranchID list for the current process1. The product is written to the art ROOT output files so that the TriggerResults information can be used further downstream. It may, therefore, come as a surprise that even though users do not explicitly create any products with the producers or filters included in a trigger path for a given process, a TriggerResults product (and, therefore, a art::TriggerResults_TriggerResults__<process name> Event branch) is written to the output file.

If users are not aware of this, they may be surprised to learn that a file produced with one configuration cannot be concatenated with a file produced using the same configuration plus an extra filtering stage.

1 The TriggerResults product is a crucial aspect of the SelectEvents facility supported for output streams and analyzers.

Empty BranchIDLists

A BranchIDList is always added to the list of BranchIDLists in a file, even if the particular output file was produced with a process that did not explicitly create any products and filtered no events. For such a process, the BranchIDList that is appended is empty. The empty list is included in the set of BranchIDLists that are persisted in the output file.


BranchIDList consistency criterion

Having described the BranchIDList formation process, we can now more precisely define the BranchIDList consistency criterion, which was stated to be:

Each Event-level data product (including TriggerResult, automatically inserted by art for filters and producers) must have been produced in the same process for each input file, as determined by comparing the BranchIDLists for each file.

Stated more specifically, the consistency criterion is:

With the exception of the first input file, each BranchIDList in a file must be identical to the corresponding BranchIDList in the BranchIDList registry, which is the ordered union of the BranchIDLists of all previous input files and that of the current process.

The procedure is as follows:

  1. The BranchIDList registry is seeded with the ordered list of BranchIDList objects from the first input file.
  2. The BranchIDList of the current process, which is empty if no products are produced and no events are filtered, is appended to the end of the current registry.
  3. The BranchIDLists from each subsequent input file are compared list-by-list to those in the BranchIDList registry.
  4. Assuming the lists are identical for the element indices that are common, then the input file can be concatenated with the previous one.
  5. If a subsequent input file has more BranchIDLists that those in the registry, the additional lists are appended to the registry if the other lists are identical.

In what follows, we illustrate how the above procedure is carried out under various circumstances. In each of the examples, an art process ("current process") reads in three input files: a.root, b.root, and c.root. For some of the examples, the current process also creates an additional product. Each scenario shows how the BranchIDList consistency criterion is met (allowing concatenation) or violated (resulting in a thrown exception).

Examples that satisfy the criterion

Example 1: No new products

Consider this diagram:

analysis-only process

The illustrated sequence of events is:

  1. The BranchIDLists from a.root seed the BranchIDList registry.
  2. Since the current process does not produce any new products, an empty BranchIDList is appended to the registry
  3. Upon opening b.root, the BranchIDLists are compared for p1 to p3.
  4. Since the p1, p2, and p3 BranchIDLists are identical in b.root and in the registry, then b.root can be concatenated with a.root.
  5. The p4 BranchIDList in b.root is appended to the registry.
  6. The third input file c.root contains only one BranchIDList, which is identical with the first BranchIDList in the registry--c.root can therefore be concatenated with the previous two files.

Example 2: New products from current process

create-product process

For this scenario:

  1. The BranchIDLists from a.root seed the BranchIDList registry.
  2. The current process does produce a new product, so the corresponding BranchIDList is added to the registry
  3. The BranchIDLists from both b.root and c.root are identical with their counterparts in the BranchIDList registry--they can thus be concatenated with a.root.

Example 3: New products from new input file

This situation is an interesting variant on the first example.

  1. The BranchIDLists from a.root seed the BranchIDList registry.
  2. The current process does not produce any new products, so an empty BrannchIDList is appended to the registry
  3. Since the p1, p2, and p3 BranchIDLists are identical in b.root and in the registry, then b.root can be concatenated with a.root.
  4. The p4 BranchIDList in b.root is appended to the registry, allowing a new product to be accessed in the process2.
  5. The third input file c.root contains only one BranchIDList, which is identical with the first BranchIDList in the registry--c.root can therefore be concatenated with the previous two files.

2 Although the BranchIDList comparisons are successful in step 3, an extra restriction exists in art versions older than 1.17.00 so that adding a new product in this way is not supported. This restriction has been lifted for versions 1.17.00 and newer, thus supporting new products in this manner.

Examples that violate the criterion

Example 1: New products from new file

analysis-only process

For this example:

  1. Similar reasoning as above indicates that b.root can be concatenated with a.root as the BranchIDLists are consistent.
  2. However, whenever c.root is read, its p3 BranchIDList includes three entries, whereas the corresponding list in the registry is empty.
  3. The p3 lists do not agree, therefore an exception is thrown, and art attempts a graceful shutdown.

Example 2: New products from current process

create-product process

For this scenario, the reasoning is similar to the previous example:

  1. In this process, another product is produced (b9), and therefore the p3 registry entry has one element.
  2. Because the p1 and p2 lists in b.root match those in the registry, b.root may be concatenated with a.root.
  3. When reading c.root, its p3 list does not agree with the registry's p3 BranchIDList.
  4. An exception is thrown, and art attempts a graceful shutdown.

BranchIDLists and dropping products

The BranchIDList is as an immutable object. Even introducing "drop" directives like:

source.inputCommands:        [ "drop *_*_myInstance_process1 InEvent" ]
outputs.out1.outputCommands: [ "drop *_*_myInstance_process2 InEvent" ]

do not modify in any way the set of ordered BranchIDLists either in a particular file (via outputCommands), or in the current process (via inputCommands). The reason for this restriction has to do with the particular implementation of how art::Ptrs are dereferenced. Removing such a restriction could, in principle, have disastrous consequences with how art currently works.

To be sure, the "drop" directives do prevent products from being written to a file or from being considered in the current process3. However, a record of their creation is preserved in the BranchIDList, solely for the purpose of ensuring the correct behavior of an art::Ptr.

The end result is that, with some exceptions, dropping products, in general, does not affect whether two files can be concatenated.

3 For inputCommands: [ "drop ..." ], not only are the products specified dropped from consideration in the process, but any products that were made depending on the dropped products are also removed from consideration in the process. This is the default behavior--products that were produced via a dropped product can be retained by adjusting a job-configuration parameter. To be sure, the input file is not modified in any way--i.e. the products are not removed from the input file. For the sake of the current process, however, any dropped products, including the products that depend on them, are inaccessible. In addition, no products (and their descendants) dropped on input can be persisted to subsequent output streams.