Project

General

Profile

Support #16111

readNext returned run/subrun that is mismatched to "INVALID"

Added by Will Foreman about 3 years ago. Updated about 3 years ago.

Status:
Closed
Priority:
High
Assignee:
Category:
Event Loop
Target version:
Start date:
04/05/2017
Due date:
% Done:

100%

Estimated time:
Spent time:
Scope:
Internal
Experiment:
LArIAT
SSI Package:
art
Duration:

Description

I'm using lariatsoft v06_30_00. When I run this command:

lar -c multiple_input_output_slice_job.fcl -s /pnfs/lariat/raw/005/098/lariat_r011078_sr001*.root

the process crashes when transitioning to the next input file, with this error:

%MSG-s ArtException: PostOpenFile 05-Apr-2017 16:19:58 CDT PostEndRun
cet::exception caught in art
---- DataCorruption BEGIN
readNext returned a SubRun run: 11078 subRun: 11 which is a mismatch to run: INVALID
---- DataCorruption END
%MSG
Art has completed and will exit with status 16.

This command also fails when given a file list (lar -c _ -S files.list). However it works fine for older versions of lariatsoft prior to v6_30_00.

Can you help us figure out what the issue is?

Thanks,
Will

History

#1 Updated by Will Foreman about 3 years ago

By the way, I am running this on a virtual machine: lariatgpvm03.fnal.gov

#2 Updated by Marc Paterno about 3 years ago

  • Status changed from New to Accepted

We will investigate at highest priority.

#3 Updated by Kyle Knoepfel about 3 years ago

  • Status changed from Accepted to Assigned
  • Assignee set to Kyle Knoepfel
  • % Done changed from 0 to 50

I have reproduced the problem, and the issue is understood.

With art 2.06.00, the pre-conditions for the user-provided readNext function were changed. These are documented in the release notes (see here) under the "Breaking changes" section, specifically "Change in art::Source template readNext pre-conditions". The change is that whenever an input file has been closed, and a new one opened, the inR and inSR pointers are null, and it is therefore the responsibility of the user to ensure that the outR and outSR Principal pointers are appropriately set.

In this particular case, the first file has the following event-list:

File: lariat_r011078_sr0010.root

Printing the list of Runs, SubRuns, and Events stored in the root file.

            Run         SubRun          Event
          11078                              
          11078             10               
          11078             10             37
          11078             10             38
          11078             10             39
          11078             10             40

Whenever that file is closed, the second file is opened, which has the following event list in it:

==============================
File: lariat_r011078_sr0011.root

Printing the list of Runs, SubRuns, and Events stored in the root file.

            Run         SubRun          Event
          11078                              
          11078             11               
          11078             11             41
          11078             11             42
          11078             11             43
          11078             11             44

Since the SubRun number is different in the second file, the outSR principal pointer is appropriately set by the EventBuilderInput, which your job configuration uses. However, since the Run number has not changed in the second input file, the EventBuilderInput does not assign the outR pointer, even though the inR pointer no longer exists. This is what eventually leads to the exception throw.

A fix would be for the author of EventBuilderInput to make sure that a new (Sub)Run principal is created whenever (a) a new (Sub)Run number is encountered in readNext that differs from the cached one, or (b) whenever a new input file is opened.

The art team will discuss this issue tomorrow to determine if a more helpful exception message could be generated.

#4 Updated by Kyle Knoepfel about 3 years ago

  • Tracker changed from Bug to Support

#5 Updated by Will Foreman about 3 years ago

Wow, thank you! I think we can make the suggested changes in EventBuilderInput for now.

#6 Updated by Kyle Knoepfel about 3 years ago

  • Category set to Event Loop
  • Status changed from Assigned to Resolved
  • % Done changed from 50 to 100
  • SSI Package art added

The checking of the Principal pointers has been strengthened, and a more helpful exception message is now provided. In the particular circumstance for this issue, the emitted error message is now:

---- LogicError BEGIN
  readNext returned true but no RunPrincipal has been set, and no cached RunPrincipal exists.
  This can happen if a new input file has been opened and the RunPrincipal has not been appropriately assigned.
---- LogicError END

Resolved with commit art:f12ad89c.

#7 Updated by Kyle Knoepfel about 3 years ago

  • Target version set to 1209

#8 Updated by Kyle Knoepfel about 3 years ago

  • Target version changed from 1209 to 2.07.01

#9 Updated by Kyle Knoepfel about 3 years ago

  • Status changed from Resolved to Closed


Also available in: Atom PDF