Bug #18584
35t data reco broken in v06_60_00_01
0%
Description
It appears the fcl for 35t data reco is broken in release v06_60_00_01:
dune-dev> fcldump standard_reco_dune35tdata.fcl 3 /home/dladams/dudev/dudev01/workdir/localProducts_larsoft_v06_60_00_e14_prof/dunetpc/v06_60_00_01/job/standard_reco_dune35tdata.fcl terminate called after throwing an instance of 'cet::coded_exception<fhicl::error, &fhicl::detail::translate[abi:cxx11]>' what(): ---- Parse error BEGIN Local lookup error ---- Can't find key BEGIN dune35t_particlestitcher (at part "dune35t_particlestitcher") ---- Can't find key END at line 73, character 23, of file "/home/dladams/dudev/dudev01/workdir/localProducts_larsoft_v06_60_00_e14_prof/dunetpc/v06_60_00_01/job/standard_reco_dune35tdata.fcl" particlestitcher: @local::dune35t_particlestitcher ^ ---- Parse error END Aborted (core dumped)
Is this not included in our CI testing?
Do we no longer support 35t data reco?
Or am I looking at an obsolete fcl file?
History
#1 Updated by Christoph Alt about 3 years ago
This is the .fcl the CI test is using: https://cdcvs.fnal.gov/redmine/projects/dunetpc/repository/revisions/develop/entry/test/ci/ci_test_reco_dune35t.fcl
The CI test also reports an error for DUNE 35T reco: http://dbweb5.fnal.gov:8080/LarCI/app/ns:dune/storage/docs/2017/12/16/stdout_ig1xMxN.log
Here the interesting part:
151: LArRotationalTransformationPlugin::Initialize - Plugin does not support provided LArTPC configurations 152: m_pLArTransformationPlugin->Initialize() return STATUS_CODE_INVALID_PARAMETER 153: in function: InitializePlugins 154: in file: /scratch/workspace/build-larsoft/v06_60_00/SLF6/prof/build/pandora/v03_07_00/Linux64bit+2.6-2.12-e14-nu-prof/pandora-v03-07-00/PandoraSDK-v03-02-00/src/Managers/PluginManager.cc line#: 219 155: m_pPandoraImpl->InitializePlugins(&xmlHandle) throw STATUS_CODE_INVALID_PARAMETER 156: in function: ReadSettings 157: in file: /scratch/workspace/build-larsoft/v06_60_00/SLF6/prof/build/pandora/v03_07_00/Linux64bit+2.6-2.12-e14-nu-prof/pandora-v03-07-00/PandoraSDK-v03-02-00/src/Pandora/Pandora.cc line#: 149 158: Failure in reading pandora settings, STATUS_CODE_INVALID_PARAMETER 159: PandoraApi::ReadSettings(*m_pPrimaryPandora, fullConfigFileName) throw STATUS_CODE_FAILURE 160: in function: ConfigurePandoraInstances 161: in file: /scratch/workspace/lar_ci/label_exp/SLF6/label_exp2/swarm/LArSoft/srcs/larpandora/larpandora/LArPandoraInterface/StandardPandora_module.cc line#: 109 162: %MSG-e BeginJob: StandardPandora:pandora@BeginJob 16-Dec-2017 11:13:01 CST ModuleBeginJob 163: An unknown Exception occurred in 164: Module type=StandardPandora, Module label=pandora, Parameter Set ID=f0fdc57a300a73d5a3406b41d89f996575fc0ee2, Process name=Reco, Release Version=v2_08_04, Main Parameter Set ID=f3aef0aa34577436343317614302dbf5a4896d08 165: 166: %MSG 167: 16-Dec-2017 11:13:01 CST Closed input file "xroot://fndca1.fnal.gov:1094/pnfs/fnal.gov/usr/dune/persistent/users/vito/ci_tests_inputfiles/DUNE35T/detsim/AntiMuonCutEvents_LSU_v2_dune35t_detsim_Reference.root" 168: 169: ==================================================================================================================== 170: TimeTracker printout (sec) Min Avg Max Median RMS nEvts 171: ==================================================================================================================== 172: [ No processed events ] 173: ==================================================================================================================== 174: 175: ==================================================================================================== 176: MemoryTracker summary (base-10 MB units used) 177: 178: Peak virtual memory usage (VmPeak) : 1680.31 MB 179: Peak resident set size usage (VmHWM): 322.683 MB 180: ==================================================================================================== 181: 182: TrigReport ---------- Event Summary ------------ 183: TrigReport Events total = 0 passed = 0 failed = 0 184: 185: TrigReport ------ Modules in End-Path: end_path ------------ 186: TrigReport Trig Bit# Run Success Error Name 187: TrigReport 0 0 0 0 0 out1 188: 189: TimeReport ---------- Time Summary ---[sec]---- 190: TimeReport CPU = 0.082988 Real = 0.176515 191: 192: MemReport ---------- Memory Summary ---[base-10 MB]---- 193: MemReport VmPeak = 1680.33 VmHWM = 322.982 194: 195: MultiPandoraApiImpl::DeletePandoraInstances - unable to find daughter instances associated with primary 0 196: %MSG-s ArtException: PostEndJob 16-Dec-2017 11:13:02 CST ModuleEndJob 197: cet::exception caught in art 198: ---- OtherArt BEGIN 199: ---- Unknown BEGIN 200: An unknown Exception occurred in 201: Module type=StandardPandora, Module label=pandora, Parameter Set ID=f0fdc57a300a73d5a3406b41d89f996575fc0ee2, Process name=Reco, Release Version=v2_08_04, Main Parameter Set ID=f3aef0aa34577436343317614302dbf5a4896d08 202: ---- Unknown END 203: ---- OtherArt END 204: %MSG 205: Art has completed and will exit with status 1.
#2 Updated by Christoph Alt about 3 years ago
I quote Tom's email here:
Hi Christoph, David, Looks like the dune35t_particlestitcher table was removed in this commit https://cdcvs.fnal.gov/redmine/projects/dunetpc/repository/revisions/426486a9ada8311aa8e16e8f7dc2d40a8552c394 by John (cc’d, and added to the watch list). He updated standard_reco_dune35tsim.fcl but not these others: I looked up instances of dune35t_particlestitcher in dunetpc v06_55_01: /cvmfs/dune.opensciencegrid.org/products/dune/dunetpc/v06_55_01/job/pandoramodules_dune.fcl — definition, now gone /cvmfs/dune.opensciencegrid.org/products/dune/dunetpc/v06_55_01/job/reco_dune35tsim_blur.fcl /cvmfs/dune.opensciencegrid.org/products/dune/dunetpc/v06_55_01/job/reco_dune35tsim_emhits.fcl /cvmfs/dune.opensciencegrid.org/products/dune/dunetpc/v06_55_01/job/standard_reco_dune35tdata.fcl /cvmfs/dune.opensciencegrid.org/products/dune/dunetpc/v06_55_01/job/standard_reco_dune35tdata_fasthit.fcl /cvmfs/dune.opensciencegrid.org/products/dune/dunetpc/v06_55_01/job/standard_reco_dune35tsim.fcl — updated by John Tom
I also add the Pandora team to the watchlist (as requested by John last week).
#3 Updated by John Marshall about 3 years ago
Hi,
Firstly, to clarify the specific issue about the dune35t_particlestitcher: this was rendered redundant in the recent Pandora changes, described in the LArSoft coordination meeting on December 5. The stitching procedure is now explicitly performed as a step within the Pandora pattern recognition.
Secondly, the (wider) issue about the associated .fcl footprint: there is now a large number of .fcl files, distributed across uboonecode and dunetpc (and sbndcode now too), that use/steer the Pandora pattern recognition. The Pandora team is not familiar with many of these and so, for this release, we explicitly asked for help from the maintainers of dunetpc and uboonecode to check a proposed, minimal set of .fcl file changes in named feature branches.
The slides associated with this request are linked below:
https://indico.fnal.gov/event/15868/contribution/3/material/slides/0.pdf
In advance of this meeting, a number of people (but not an exhaustive list) were also contacted directly by e-mail to warn about this request:
[Description of changes] The downside of doing this is that there is a significant footprint in the reconstruction .fcl files. For dunetpc it is nowhere near as involved as for uboonecode, but in tomorrow's meeting we would like to ask you (as maintainers of dunetpc) to look over the proposed changes in the feature branch feature/larpandoracontent_v03_09_00 and confirm you are OK with them, and provide any further instructions about e.g. whether there are other .fcl files to which the same/similar changes should be applied.
From the Pandora side, it looks like we should have tried even harder to make sure that everyone was able to check over the .fcl footprint. We did receive a named list of .fcl files to change from uboonecode and from ProtoDUNE, and this was helpful and appreciated. From the dunetpc side, some detailed response would clearly have helped, but timescales were, of course, being driven by the needs of the ProtoDUNE MCC10 production.
#4 Updated by Thomas Junk about 3 years ago
Sadly, I was trapped in another meeting on Dec. 5 and had to go on your slides and e-mails.
"footprint" is not the right word -- it indicates the size something takes up, and does not on its face imply maintenance. "maintenance" is the right word, or "necessary changes". I looked briefly at the changes to dune_pandora.fcl in the feature branch but did not realize all of the changes that would be needed to other fcl files. Some more explicit guidance of what changes are necessary would be of great use when making a breaking change moving forwards.
We can make those changes but we want to reduce the amount of guesswork required to catch up with breaking changes, and it takes time to do it.
Tom
#5 Updated by Thomas Junk about 3 years ago
I looked for "particlestitcher" in dunetpc's fcl files and followe John's example in job/standard_reco_dune35tsim.fcl: remove all modules labeled "particlestitcher" and "particlestitcherdc", their configurations, and replace "particlstitcher" with "pandoraTrack" and "particlestitcherdc" with "pandoraTrackdc" for fcl parameters describing input module labels. This should get the cases where the data products are produced and consumed in the same job. I updated these files:
modified: fcl/dune35t/mergeana/standard_ana_dune35t.fcl
modified: fcl/dune35t/mergeana/standard_ana_dune35t_data.fcl
modified: fcl/dune35t/reco/reco_dune35tsim_blur.fcl
modified: fcl/dune35t/reco/reco_dune35tsim_emhits.fcl
modified: fcl/dune35t/reco/standard_reco_dune35tdata.fcl
modified: fcl/dune35t/reco/standard_reco_dune35tdata_fasthit.fcl
modified: fcl/dune35t/reco/standard_reco_dune35tdata_robusthit.fcl
modified: fcl/dune35t/reco/standard_reco_dune35tsim_robusthit.fcl
and will see what the CI tests do.
There are four fcl files mentioning particlestitcher still, but they appear to read these tracks from files and do not appear to produce them.
dune/CTree/ctree35t.fcl: TrackModuleLabel: "particlestitcher"
dune/CTree/ctreeraw35t_trigHM.fcl: TrackModuleLabel: "particlestitcher"
dune/NearlineMonitor/evd/ctreeraw35t_trigTPC.fcl: TrackModuleLabel: "particlestitcher"
dune/NearlineMonitor/evd/ctreeraw35t_trigTPC_cwpfilter.fcl: TrackModuleLabel: "particlestitcher"
This code is on its way to obsolescence, but we should keep it around for inspection if need be.
#6 Updated by Thomas Junk about 3 years ago
Unfortunately it's still broken, with the "unknown exception" Christoph posted earlier. If I use the fcl file John edited for us
lar -n 1 -c standard_reco_dune35tsim.fcl /pnfs/dune/persistent/users/vito/ci_tests_inputfiles/DUNE35T/detsim/AntiMuonCutEvents_LSU_v2_dune35t_detsim_Reference.root
I get
LArRotationalTransformationPlugin::Initialize - Plugin does not support provided LArTPC configurations
m_pLArTransformationPlugin->Initialize() return STATUS_CODE_INVALID_PARAMETER
in function: InitializePlugins
in file: /scratch/workspace/build-larsoft/v06_60_00/SLF6/prof/build/pandora/v03_07_00/Linux64bit+2.6-2.12-e14-nu-prof/pandora-v03-07-00/PandoraSDK-v03-02-00/src/Managers/PluginManager.cc line#: 219
m_pPandoraImpl->InitializePlugins(&xmlHandle) throw STATUS_CODE_INVALID_PARAMETER
in function: ReadSettings
in file: /scratch/workspace/build-larsoft/v06_60_00/SLF6/prof/build/pandora/v03_07_00/Linux64bit+2.6-2.12-e14-nu-prof/pandora-v03-07-00/PandoraSDK-v03-02-00/src/Pandora/Pandora.cc line#: 149
Failure in reading pandora settings, STATUS_CODE_INVALID_PARAMETER
PandoraApi::ReadSettings(*m_pPrimaryPandora, fullConfigFileName) throw STATUS_CODE_FAILURE
in function: ConfigurePandoraInstances
in file: /scratch/workspace/build-larsoft/v06_60_00/SLF6/prof/build/larpandora/v06_18_00/src/larpandora/LArPandoraInterface/StandardPandora_module.cc line#: 109
%MSG-e BeginJob: StandardPandora:pandora@BeginJob 18-Dec-2017 16:28:32 CST ModuleBeginJob
An unknown Exception occurred in
Module type=StandardPandora, Module label=pandora, Parameter Set ID=f0fdc57a300a73d5a3406b41d89f996575fc0ee2, Process name=Reco, Release Version=v2_08_04, Main Parameter Set ID=c73966792610d509e3c29438cbc891d1080f4f11
%MSG
%MSG-e BeginJob: StandardPandora:pandora@BeginJob 18-Dec-2017 16:28:32 CST ModuleBeginJob
An unknown Exception occurred in
Module type=StandardPandora, Module label=pandora, Parameter Set ID=f0fdc57a300a73d5a3406b41d89f996575fc0ee2, Process name=Reco, Release Version=v2_08_04, Main Parameter Set ID=c73966792610d509e3c29438cbc891d1080f4f11
%MSG
#7 Updated by Thomas Junk about 3 years ago
I think I see something close to the source of the problem, found with a debugger, using dunetpc v06_60_00_01, which depends on larsoft v06_60_00. In
/cvmfs/fermilab.opensciencegrid.org/products/larsoft/larpandoracontent/v03_09_00/source/larpandoracontent/LArPlugins/LArRotationalTransformationPlugin.cc
line 266 is a test of the U, V, and sigmaUVW angles. This gets called several times when running
lar -n 1 -c standard_reco_dune35tsim.fcl /pnfs/dune/persistent/users/vito/ci_tests_inputfiles/DUNE35T/detsim/AntiMuonCutEvents_LSU_v2_dune35t_detsim_Reference.root
m_thetaU is 0.772727072
m_thetaV is 0.797702730
as expected for the 35t which has slightly different U and V angles. The first four trips through this method return STATUS_CODE_SUCCESS, but on the fifth time around, pLArTPC->GetWireAngleV() gives 0.772727072 and pLArTPC->GetWireAngleU() gives 0.79770273, which is backwards.
#8 Updated by John Marshall about 3 years ago
Hi,
Thanks for looking into this. We’ve just had a quick meeting to discuss.
The reported error is not due to an implementation issue, but a matter of our chosen reconstruction strategy not being directly applicable to DUNE35t. This is because the wire angles differ between u/v and the LArRotationalTransformationPlugin very strictly (but correctly) identifies that it is not designed for such a use case. Without a registered plugin to provide coordinate transformations e.g. u, v -> w then the patrec cannot proceed.
There’s more details below, but this motivates two alternative ways of moving forwards:
1. We have tested and pushed a larpandoracontent branch "feature/LArRotationalTransformationPlugin" to Redmine that will allow the DUNE35t reconstruction to proceed, with the understanding that the transformations will only be approximations in some drift volumes. The diff should read cleanly, and will show that the maximum allowed difference between like wire (u and v) angles between volumes is now configurable and carries a larger default value.
or
2. We decide that Pandora should no longer be included in the DUNE35t reconstruction, as the Pandora team does not have sufficient person power to maintain it. It is not an experiment that Pandora has targeted directly, but has been able to use developments put together for e.g. MicroBooNE. Our routine testing considers tens of thousands of MicroBooNE events nightly, about ten thousand ProtoDUNE events upon demand and a few hundred DUNEFD events upon demand. It does not include DUNE35t (more detail about our tests, with Travis CI, Coverity, codecov and Valgrind are available upon request).
Please let us know which you prefer.
As a bit more background information about the overall reconstruction strategy:
The strategy is to give all input 2D hits to a master Pandora instance. The wire angles for alternate drift volumes are interchanged (as you spotted), and the u, v hit types are also interchanged between volumes so that there are no sudden discontinuities in the u and v input “images” that are examined in the patrec stage. The LArRotationalTransformationPlugin will check the wire angles, as discussed above.
Within the master Pandora instance, hits in individual drift volumes are given to Pandora worker instances, with one worker instance per drift volume. Each drift volume is then reconstructed individually and the resulting particles are retrieved by the master instance. The master instance then examines the 3D spacepoints and performs “particle stitching” between volumes as required.
The reconstruction can then, typically, proceed with cosmic-ray tagging and hit removal steps, then event slicing, which identifies regions of 3D space/hits that represent individual, discrete interactions. The cosmic-ray and neutrino/beam outcomes are then evaluated for each slice, before a final, consolidated event is delivered. These remaining steps all work in the context of a spoofed, global drift volume.
Our strategy therefore "relies” on equal/opposite u and v wire angles to the vertical, but the DUNE35t detector is not so far removed from this, and we could proceed with strategy 1, above, if desired.
We hope this helps.
#9 Updated by Thomas Junk about 3 years ago
Thanks for the update. We'll bring this up at the Software meeting tomorrow.
#10 Updated by Thomas Junk about 3 years ago
Looks like the consensus is to drop pandora reconstruction for 35-ton. It wasn't used for publication analyses, and having a confusing pandora reco that is different for some drift volumes from others is less good than no pandora reco for 35-ton. I will modify fcl files to remove it, and it looks like pmtrajfit and similar modules also have to go as well as they depend on pandora output.
#11 Updated by David Adams about 3 years ago
Tom, can we close this now?
#12 Updated by Thomas Junk about 3 years ago
Yes, I think we addressed this. Thanks!
#13 Updated by David Adams about 3 years ago
- Status changed from New to Closed