Project

General

Profile

Concatenating art ROOT files

Preface

This page describes input-file concatenation requirements for various versions of art. If you are using an art version of 2.08.00 or newer, then none of the below considerations applies--i.e. there are no input-file concatenation restrictions for art versions 2.08.00 and newer.

After reading the below information, see here to understand how Event-level products are handled in the context of input concatenation.


Introduction

As many art users are aware, art can be run with multiple input files specified at the command line:

art -c config.fcl -s input1.root -s input2.root    # option 1
art -c config.fcl -S input_file_list.txt           # option 2

Such a command concatenates input files, allowing a user to retrieve products stored within those files. Users can then query the data contained by those products for further data processing, data product creation, or analysis, and then they can write the new products to one or more output streams. Often, the art process will execute without incident. There are, however, several restrictions as to what kinds of files can be read together in the same art process for art versions older than 2.08.00. Without knowing such restrictions, users can encounter an error such as:

Cannot merge file '<some_rootfile>.root' due to a branch mismatch.

 Previous File    File to merge
 ==============================
 2556661413       2556661413   
 701045456        3413590908   
 3413590908

The BranchIDs above correspond to products
that were created whenever the current input files
were produced.  The lists above must be identical.
To determine which products these BranchIDs correspond to,
rerun the process that produced the input files,
enabling full debug output for the message service.
Then 'grep' the log file for messages with 'BranchID'

Contact the framework group for assistance.

Receiving such an error message can be cryptic and frustrating.

The information presented here is meant to help users avoid errors like those encountered above. Specifically, in the next section we list what types of restrictions are imposed when attempting to concatenate input files. Crucial to understanding these restrictions are the concepts of BranchID, BranchIDList and the BranchIDList registry. You are therefore encouraged to read this page so you can more fully understand and anticipate in which situations your art jobs will succeed and in which situations they will not.


Concatenation consistency criteria

Before art 1.17.00

1 If two or more files serve as input files, the first file cannot have zero events.
2 For a process that creates more products, the number of processes used to create the first input file defines the maximum number of processes used to create any input file.
3 The branches in the Events, SubRuns, Runs, and ResultsTree trees in the first input file define the set of allowed branches in all input files.
4 Each Event-level data product (including TriggerResult, automatically inserted by art for filters and producers) must have been produced in the same process for each input file, as determined by comparing the BranchIDLists for each file
5 For art processes that write output files, each input file must contain the same set of products.

art 1.17.00 - 2.07.03

1 For a process that creates more products or filters events, the number of processes used to create the first input file defines the maximum number of processes used to create any input file.
2 Each Event-level data product (including TriggerResult, automatically inserted by art for filters and producers) must have been produced in the same process for each input file, as determined by comparing the BranchIDLists for each file.
3 For art processes that write output files, each input file must contain the same set of products.

If your job configuration breaks any of these rules, art will throw an exception, the details of which depend on the particular violation.


Dumping the BranchIDLists object from the art/ROOT file

As described here, it is important that the BranchIDLists objects from each art/ROOT input file be consistent. To aid the user to determine if a given set of files can be consistent, the file_info_dumper --branch-ids utility exists. The printout for a given file looks something like this:

$ file_info_dumper --branch-ids reco_1.root 
==============================
File: reco_1.root

 List of BranchIDs produced for this file.  The BranchIDs are
 grouped according to the process in which they were produced.  The
 processes are presented in chronological order; however within each process,
 the order of listed BranchIDs is not meaningful.

 Process 1: eventGen
    128974688
    2787675622

 Process 2: particleSim
    3547627907
    2665763966

 Process 3: reco
    3252151789