MCReco crashes with empty event
I'm working in uboonecode with a generator module that occasionally spits out an empty event, and whenever I propagate an empty event through to MCReco, LArSoft crashes with the following error message:
cet::exception caught in art
---- EventProcessorFailure BEGIN
An exception occurred during current event processing
---- ScheduleExecutionFailure BEGIN
---- StdException BEGIN
A std::exception occurred during a call to the module MCReco/mcreco run: 1 subRun: 0 event: 217
and cannot be repropagated.
vector::_M_range_check: __n (which is 7) >= this->size() (which is 0)
---- StdException END
Exception going through path simulate
---- ScheduleExecutionFailure END
cet::exception caught in EventProcessor and rethrown
Another exception was caught while trying to clean up files after
the primary exception. We give up trying to clean up files at
this point. The description of this additional exception follows:
---- FatalRootError BEGIN
Fatal Root Error: @SUB=TTree::SetEntries
Tree branches have different numbers of entries, with 10000 maximum.
---- FatalRootError END
---- EventProcessorFailure END
I'm not sure if this is an issue with MCReco, or a problem with the event generator that doesn't make itself immediately apparent.
#1 Updated by Gianluca Petrillo almost 5 years ago
- Category set to Simulation
- Status changed from New to Accepted
If you are running
standard_g4_uboone.fclor similar, that would be:
- the generator input file
- information about the random seed settings, that I hope we can extract from the log of the failing job
- or the partially saved output file of the crashing job
#2 Updated by Jeremy Hewes almost 5 years ago
Sure. The FHICL file I used to generate the events is /uboone/data/users/jhewes15/genie_nnbar/fcl/nnbar_run3.fcl. It depends on a custom version of larsim's nucleon decay generator, so if you want to recreate what I did, you'll need a fresh LArSoft installation (I used v04_25_00), but copy the custom code from /uboone/app/users/jhewes15/larsoft/v04_25_00/srcs/larsim/EventGenerator/NDKGen_module.cc to your local directory.
There are also some example output files in /uboone/data/users/jhewes15/mcreco_bugtest that you can look at.
HOWEVER, I did some investigating myself and I have a better handle on what's going on. The error message LArSoft throws out is actually an error caused by std::vector - it occurs when you try to call vector.at(int) when int is larger than the size of vector. It's just the vector going out of range.
I searched through the code for every instance of this function being called, and I tracked down where it's going wrong. It's in MCShowerRecoAlg.cxx, on line 44. The object MCRecoPart& part_v is empty, and the code expects that it should not be. Here's the debug output I got for the event that crashes:
First block of .at()... OK, let's spit out some variables and try to figure out wtf is going on.
We have shower_index_v which is a vector of integers with size 172
We loop over all values from 0 to 172 in shower_index - current value is 0
Now, the thing we're about to try is getting the value of shower_candidate, which is the value of shower_index_v at shower_index...
So that worked, and the value of shower_candidate is 7
OK, so this whole function receives pointers holding MCRecoPart and MCRecoEdep objects.
Now what happens is, we're going to try to get the particle associated with this shower_candidate.
Quick check... how many particles are inside MCRecoPart holder? Let's see! It's 0
...and then the code dies, and it never makes it to the end of my couts. I will continue to investigate, but I don't have a great understanding of what all these data holders are supposed to contain, so let me know if you can identify the issue.
#3 Updated by Gianluca Petrillo almost 5 years ago
The bug has been fixed with commit:5fccd02aa2ba0ee85484d751edc026b0db192014 in
The problem was in a sub-algorithm of MC shower reconstruction that would stop immediately on an empty input, without clearing the results from the previous call.
[edit: changed the commit hash; the original push failed, now I try again]