Project

General

Profile

Bug #17117

prodsingle_sbnd.fcl crashes with larsoft v06_42_00

Added by Michelle Stancari over 2 years ago. Updated about 2 years ago.

Status:
Closed
Priority:
Normal
Category:
Simulation
Target version:
-
Start date:
07/06/2017
Due date:
% Done:

100%

Estimated time:
Spent time:
Occurs In:
Experiment:
Co-Assignees:
Duration:

Description

Strange behavior - crashes often but not always. When it does complete a single event, there are few (<5) "can find nearest wire" errors.

dump_crash.out (96.4 KB) dump_crash.out Michelle Stancari, 07/06/2017 03:07 PM

Related issues

Blocks LArSoft - Necessary Maintenance #17047: Floating Point ExceptionsAssigned06/27/2017

Associated revisions

Revision 21168b08 (diff)
Added by Gianluca Petrillo over 2 years ago

An attempt to a solution to issue #17117.

If dx is zero (and exactly so), dE/dx is arbitrarily set to zero too.

Revision fdb42b0f (diff)
Added by Gianluca Petrillo over 2 years ago

An attempt to a solution to issue #17117.

If dx is zero (and exactly so), dE/dx is arbitrarily set to zero too.

History

#1 Updated by Dominic Brailsford over 2 years ago

Michelle Stancari wrote:

Strange behavior - crashes often but not always. When it does complete a single event, there are few (<5) "can find nearest wire" errors.

On line 2009: ERROR: 8 - Floating point divide by zero.

Have floating point exceptions been turned on?
I know there is a big campaign going on about this kind of thing right now:
https://cdcvs.fnal.gov/redmine/issues/17047#change-49641

#2 Updated by Gianluca Petrillo over 2 years ago

  • Status changed from New to Assigned
  • Assignee set to Gianluca Petrillo

Floating point exceptions should not have been globally enabled by that campaign.
If this is the cause, it's by accident.

On the other end, the problem is not the floating point exception, but the code that is not protected against it.
My first feeling is it's a LArSoft bug.

#3 Updated by Gianluca Petrillo over 2 years ago

Relevant information: the reporter is using debug qualifiers.

#4 Updated by Gianluca Petrillo over 2 years ago

In fact, it appears Geant4 with debug qualifiers takes the initiative and enables FPEs (as shown quite clearly in the log Michelle attached).

#5 Updated by Gianluca Petrillo over 2 years ago

  • Project changed from SBND code to LArSoft
  • Category changed from Reconstruction to Simulation

Bouncing to LArSoft, since it's a LArSoft bug.

#6 Updated by Gianluca Petrillo over 2 years ago

The problem stems from a Geant4 step which is with $dx = 0$ (and energy of the order of 10^-14 GeV). Computing $dE/dx$ in such a situation is plenty of fun.
I vaguely remember this problem having been already spotted, but I can't find any open issue about it.

The crash happens in larg4::ISCalculationSeparate::CalculateIonizationAndScintillation(), larsim:source:larsim/LArG4/ISCalculationSeparate.cxx.

#7 Updated by Gianluca Petrillo over 2 years ago

#8 Updated by Gianluca Petrillo over 2 years ago

  • % Done changed from 0 to 10
  • Occurs In v06_42_00 added

#9 Updated by Gianluca Petrillo over 2 years ago

I can reproduce the error with FPE enabled (debug qualifiers).

The Geant4 step causing the crash (on statistics of 4 crashes) has very small energy (~10^-14 GeV) and a very small step size (~10^-14 cm). The way dx is computed in LArSoft code is not from the step size, but from taking the distance between the pre-step and post-step position, which look the same up to ~15 significant digits. It's conceivable that such small numbers result into a 0 due to rounding errors.
The process at the end of the step is called LArVoxelReadoutScoringProcess, which is added to LArG4 physics list in larsim:source:larsim/LArG4/PhysicsList.cxx as a G4ParallelWorldScoringProcess.
I need to ask the experts what that is supposed to do.

#10 Updated by Gianluca Petrillo over 2 years ago

In the end, we can code around the issue. That dx is used to compute a ["$dE/dx$"], which is used to compute the recombination factor to be applied. On that amount of energy, the recombination does not really change much, so it can be assigned a magic value of 0 (no energy) or 1 (leaving that ridiculously small energy of 10^-14 GeV, smaller than a neutrino mass, to propagate to the rest of the code).

#11 Updated by Gianluca Petrillo over 2 years ago

  • % Done changed from 10 to 50

A tentative "solution" has been coded in branch feature/gp_Issue17117 of larsim repository.
In there, if dx is exactly zero, dE/dx is also set to exactly zero. It should make physically a negligible difference.

#12 Updated by Gianluca Petrillo over 2 years ago

  • Status changed from Assigned to Resolved
  • % Done changed from 50 to 100

I have asked for this workaround to be merged into next LArSoft release.

#13 Updated by Gianluca Petrillo about 2 years ago

  • Status changed from Resolved to Closed


Also available in: Atom PDF