Bug #22339
Tons of Error Messages from OpDetBTRs
Description
It seems some CI jobs (G4 in Microboone, and I'm told ICARUS as well) are spending a very long time reporting error messages from the OpDetBacktracker (16%) like this:
22490: %MSG-e OpDetBacktrackerRecord: LArG4:largeant@BeginModule 09-Apr-2019 07:02:32 CDT run: 1 subRun: 0 event: 1 OpDetBacktrackerRecord.cxx:73
22491: AddTrackPhotons() trying to add to iTimePDclock #-3.02956e+06 1 photons with 5.17781e-321 MeV of energy from track ID=398
22492: %MSG
Kyle is proposing to make this error a debug message instead (see slide 2 here):
https://indico.fnal.gov/event/20409/contribution/1/material/slides/0.pdf
As we're making these messages disappear, it seemed like it was worth taking a moment to make sure there isn't some other problem here which needs to be addressed.
Here's an example log:
http://dbweb5.fnal.gov:8080/LarCI/app/ns:uboone/storage/docs/2019/04/09/stdout__JnSHJCs.log
from this CI test:
http://dbweb5.fnal.gov:8080/LarCI/app/ns:uboone/build_detail/test_details?build_id=uboone_ci/2252&platform=Linux%202.6.32-754.11.1.el6.x86_64&buildtype=slf6%20e17:prof&phase=ci_tests&test=ci_g4_regression_test_uboonecode
Associated revisions
LArSoft Issue #22339 In all cases the energy should now be a useful
quantity. When the Birk Coefficient is used, we store the visible energy
from ISCalc. When not used, we store the g4 TotalEnergyDeposit. The
Edeposited should never be an uninitialized value in any possible case
now.
Larsoft Issue #22339
I am revisiting this issue after a report from ProtoDUNE that they see
this message often with high energy tracks.
Larsoft Issue #22339
I am revisiting this issue after a report from ProtoDUNE that they see
this message often with high energy tracks.
With the change in OpFastScintillation to write each photon to the
OpDetBacktrackerRecords, the energy was being incorrectly duplicated in
the BTRs in ticks with multiple photons from the same G4Step.
This was corrected in my commit last week by setting the Edeposit to 0
after the first photon was inserted, but, this caused subsequent
photons to be written with 0 energy, causing the error message that
appeared so prolifically.
Now OpFastScintillation will instead take the total energy provided by
the ISCalculator for the given step, and will divide that energy into
equal parts for each photon to be written. This results in the same
value being written, as well as making sure no zero values should ever
be passed to the BTRs.
Larsoft Issue #22339
I am revisiting this issue after a report from ProtoDUNE that they see
this message often with high energy tracks.
With the change in OpFastScintillation to write each photon to the
OpDetBacktrackerRecords, the energy was being incorrectly duplicated in
the BTRs in ticks with multiple photons from the same G4Step.
This was corrected in my commit last week by setting the Edeposit to 0
after the first photon was inserted, but, this caused subsequent
photons to be written with 0 energy, causing the error message that
appeared so prolifically.
Now OpFastScintillation will instead take the total energy provided by
the ISCalculator for the given step, and will divide that energy into
equal parts for each photon to be written. This results in the same
value being written, as well as making sure no zero values should ever
be passed to the BTRs.
History
#1 Updated by Gianluca Petrillo almost 2 years ago
I confirm I have seen a burst of these messages appearing on a single ICARUS event out of 5.
And it's smelling bad, with those energy numbers being so similar to unassigned random double precision variables.
#2 Updated by Gianluca Petrillo almost 2 years ago
This thing was bothering me already, enough that now I have put some time in investigating it.
Commit larsim:f99b849 by Jason Stock added some optional simulation of argon saturation, and changed the type of energy recorded in the backtracking information from total to visible.
Visible energy includes saturation effects, and it is computed at larg4::ISCalculationSeparate::CalculateIonizationAndScintillation()
:
- if using the same scintillation yield regardless the type of ionising particle (
detinfo::LArProperties::ScintByParticleType()
false
) and medium saturation simulation, the visible energy is computed from the total energy, including saturation effect - if using the same scintillation yield for all particles and not simulating the saturation, the visible energy is assigned
0
because "I have not made a correct implimentation of this value for anything but EMSaturation" ("I" also being Jason) - if using scintillation yield according to particle type (
detinfo::LArProperties::ScintByParticleType()
false), visible energy is left undefined, hence:- at the beginning, that load of nonsensical numbers is stored in backtracking
- after something (?) assigns a value there, it's not clear to me what happens (I did not investigate)
So in short, Jason should decide what is the right behaviour for that part of the code, and act accordingly.
#3 Updated by Gianluca Petrillo almost 2 years ago
An amend: the only thing that happens that makes the message stop is the mercy of MessageLogger, which is configured with a limit of number of messages.
The visible energy value (larg4::ISCalculationSeparate::fVisibleEnergyDeposition
) stays unassigned all the time.
#4 Updated by Jason Stock almost 2 years ago
So, apparently replies to redmine update emails don't go to the ticket (don't know why I thought they did), so I thought I was participating in this conversation, but since I was limited to my phone for most of the day I was doing so through e-mail replies.
Yes. I think the short term solution is to guarantee that the energy written to the BTR is always initialized to zero.
I need to go through and make sure that each case writes a meaningful energy long term, but at least assuring that a zero is written in all incomplete cases makes something that both doesn't slow things down with error logs, and is easy to identify in the final reco so people don't draw incorrect conclusions from incomplete records.
#5 Updated by Jason Stock almost 2 years ago
Jason Stock wrote:
So, apparently replies to red mine updates don't go to the ticket (don't know why I thought they did), so I thought I was participating in this conversation.
Yes. I think the short term solution is to guarantee that the energy written to the BTR is always initialized to zero.
I need to go through and make sure that each case writes a meaningful energy long term, but at least assuring that a zero is written in all incomplete cases makes something that both doesn't slow things down with error logs, and is easy to identify in the final reco so people don't draw incorrect conclusions from incomplete records.
Also, I will start on this now.
#6 Updated by Jason Stock almost 2 years ago
I just got around to pushing my branch. feature/JStock_Issue22339
20a84bbcaf03f399f126000875efe881290a56fc
This provides a well defined behaviour for all modes in OpFastScintillation.cxx
#7 Updated by Jason Stock almost 2 years ago
- % Done changed from 0 to 100
- Category set to Simulation
#8 Updated by Jason Stock almost 2 years ago
- Status changed from New to Resolved
#9 Updated by Lynn Garren almost 2 years ago
Thanks Jason. This might make this weeks' larsoft release.
#10 Updated by Gianluca Petrillo almost 2 years ago
Commend to Jason for using the number of the issue in the commit message, that makes it super-easy to look at it from the ticket here.
Just a note: since 0.0
is smaller than epsilon()
(see lardataobj:source:lardataobj/Simulation/OpDetBacktrackerRecord.cxx#L70), error/debug messages will still be emitted.
#11 Updated by Kyle Knoepfel almost 2 years ago
Also, please remove the comment above the `MF_LOG_ERROR` line. None of the message facility logging routines intrinsically throw exceptions.
#12 Updated by Lynn Garren almost 2 years ago
- Status changed from Resolved to Feedback
Putting this on hold pending further discussion.
#13 Updated by Jason Stock almost 2 years ago
To address Gianluca's concern, it will only write zero if none of the methods above offer an energy. That should of course not be the case. I would argue that photons coming from a step with 0 energy deposit is in fact worth an error message.
To address Kyle's point. Yes. The comment is wrong. Will changing the comment increment the data product? If so, then I would argue against doing so unless we do decide we want to demote that error message, but that condition should not happen any more with those changes, and I think the error message is useful.
Thank you both for your detailed and helpful feedback!
#14 Updated by Lynn Garren almost 2 years ago
- Status changed from Feedback to Resolved
Thanks Jason. This is going in, but I will remove the comment that Kyle finds problematic. That will not affect message logging.
#15 Updated by Jason Stock almost 2 years ago
Forgot to include the issue number in my last commit.
commit fa0a966bb4d62eb77a4658436ecee1b8f1951b4b (HEAD -> feature/JStock_Issue22339_2, origin/feature/JStock_Issue22339_2)
Merge: 405ddf54 535c4374
Author: Jason Stock <jstock@dunegpvm02.fnal.gov>
Date: Fri Apr 12 12:22:25 2019 -0500
Merge commit '535c4374ecc698af9190ed48b4fdce15ae337471' into feature/JStock_Issue22339_2
This is to resolve the issues Lynn encountered making the merge work.
#16 Updated by Kyle Knoepfel over 1 year ago
- Status changed from Resolved to Closed
#17 Updated by Jason Stock over 1 year ago
- Status changed from Closed to Remission
There is still an issue here with how multiple photons from the same step are handled. When designed, the BTRs assumed 1 step would make one deposit on one channel. This is not longer the case (each photon being added to the BTR separately.) For larger signals this issue will still persist.
A quick fix to avoid incrementing the version of the data product is to also divide the step energy by the number of photons written, and add that portion with each photon, as opposed to writing the entire value with the first photon. I will put this in a feature branch now, and flag this issue in the commit.
Sorry to have to re-open it. But, I tried to be clever and missed a subtlety.
#18 Updated by Jason Stock over 1 year ago
I should expand a bit. With the initial fix, each individual G4Step could no longer write a 0 to the BackTrackerRecords, but steps with many photons could write 0 in the additional photons. This did not show up in testing because I used the radiological model for my test. The issue simply did not arise in my test case.
#19 Updated by Jason Stock over 1 year ago
- Status changed from Remission to Accepted
Branch feature/JStock_LarsoftIssue22339 of larsim has been made and tested against ProtoDUNE, and, as it does not change any of the values ultimately written (just how they are added) to the event, I will go ahead and merge it with develop.
#20 Updated by Jason Stock over 1 year ago
- Status changed from Accepted to Assigned
#21 Updated by Jason Stock over 1 year ago
- Status changed from Assigned to Resolved
#22 Updated by Lynn Garren over 1 year ago
- Status changed from Resolved to Closed
Thanks Jason. This fix is in larsoft v08_17_00. I am closing this issue again.
LArSoft Issue #22339 In all cases the energy should now be a useful
quantity. When the Birk Coefficient is used, we store the visible energy
from ISCalc. When not used, we store the g4 TotalEnergyDeposit. The
Edeposited should never be an uninitialized value in any possible case
now.