readNext returned run/subrun that is mismatched to "INVALID"
I'm using lariatsoft v06_30_00. When I run this command:
lar -c multiple_input_output_slice_job.fcl -s /pnfs/lariat/raw/005/098/lariat_r011078_sr001*.root
the process crashes when transitioning to the next input file, with this error:
%MSG-s ArtException: PostOpenFile 05-Apr-2017 16:19:58 CDT PostEndRun
cet::exception caught in art
---- DataCorruption BEGIN
readNext returned a SubRun run: 11078 subRun: 11 which is a mismatch to run: INVALID
---- DataCorruption END
Art has completed and will exit with status 16.
This command also fails when given a file list (lar -c _ -S files.list). However it works fine for older versions of lariatsoft prior to v6_30_00.
Can you help us figure out what the issue is?
#3 Updated by Kyle Knoepfel about 3 years ago
- Status changed from Accepted to Assigned
- Assignee set to Kyle Knoepfel
- % Done changed from 0 to 50
I have reproduced the problem, and the issue is understood.
art 2.06.00, the pre-conditions for the user-provided
readNext function were changed. These are documented in the release notes (see here) under the "Breaking changes" section, specifically "Change in art::Source template readNext pre-conditions". The change is that whenever an input file has been closed, and a new one opened, the
inSR pointers are null, and it is therefore the responsibility of the user to ensure that the
Principal pointers are appropriately set.
In this particular case, the first file has the following event-list:
File: lariat_r011078_sr0010.root Printing the list of Runs, SubRuns, and Events stored in the root file. Run SubRun Event 11078 11078 10 11078 10 37 11078 10 38 11078 10 39 11078 10 40
Whenever that file is closed, the second file is opened, which has the following event list in it:
============================== File: lariat_r011078_sr0011.root Printing the list of Runs, SubRuns, and Events stored in the root file. Run SubRun Event 11078 11078 11 11078 11 41 11078 11 42 11078 11 43 11078 11 44
SubRun number is different in the second file, the
outSR principal pointer is appropriately set by the
EventBuilderInput, which your job configuration uses. However, since the
Run number has not changed in the second input file, the
EventBuilderInput does not assign the
outR pointer, even though the
inR pointer no longer exists. This is what eventually leads to the exception throw.
A fix would be for the author of
EventBuilderInput to make sure that a new
(Sub)Run principal is created whenever (a) a new
(Sub)Run number is encountered in
readNext that differs from the cached one, or (b) whenever a new input file is opened.
The art team will discuss this issue tomorrow to determine if a more helpful exception message could be generated.
#6 Updated by Kyle Knoepfel about 3 years ago
- Category set to Event Loop
- Status changed from Assigned to Resolved
- % Done changed from 50 to 100
- SSI Package art added
The checking of the
Principal pointers has been strengthened, and a more helpful exception message is now provided. In the particular circumstance for this issue, the emitted error message is now:
---- LogicError BEGIN readNext returned true but no RunPrincipal has been set, and no cached RunPrincipal exists. This can happen if a new input file has been opened and the RunPrincipal has not been appropriately assigned. ---- LogicError END
Resolved with commit art:f12ad89c.