Project

General

Profile

Bug #18584

35t data reco broken in v06_60_00_01

Added by David Adams almost 2 years ago. Updated over 1 year ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Start date:
12/17/2017
Due date:
% Done:

0%

Estimated time:
Duration:

Description

It appears the fcl for 35t data reco is broken in release v06_60_00_01:

dune-dev> fcldump standard_reco_dune35tdata.fcl 3
/home/dladams/dudev/dudev01/workdir/localProducts_larsoft_v06_60_00_e14_prof/dunetpc/v06_60_00_01/job/standard_reco_dune35tdata.fcl

terminate called after throwing an instance of 'cet::coded_exception<fhicl::error, &fhicl::detail::translate[abi:cxx11]>'
  what():  ---- Parse error BEGIN
  Local lookup error
  ---- Can't find key BEGIN
    dune35t_particlestitcher (at part "dune35t_particlestitcher")
  ---- Can't find key END
  at line 73, character 23, of file "/home/dladams/dudev/dudev01/workdir/localProducts_larsoft_v06_60_00_e14_prof/dunetpc/v06_60_00_01/job/standard_reco_dune35tdata.fcl" 

    particlestitcher:   @local::dune35t_particlestitcher
                        ^
---- Parse error END

Aborted (core dumped)

Is this not included in our CI testing?

Do we no longer support 35t data reco?

Or am I looking at an obsolete fcl file?

History

#1 Updated by Christoph Alt almost 2 years ago

This is the .fcl the CI test is using: https://cdcvs.fnal.gov/redmine/projects/dunetpc/repository/revisions/develop/entry/test/ci/ci_test_reco_dune35t.fcl

The CI test also reports an error for DUNE 35T reco: http://dbweb5.fnal.gov:8080/LarCI/app/ns:dune/storage/docs/2017/12/16/stdout_ig1xMxN.log

Here the interesting part:

151: LArRotationalTransformationPlugin::Initialize - Plugin does not support provided LArTPC configurations 
152: m_pLArTransformationPlugin->Initialize() return STATUS_CODE_INVALID_PARAMETER
153:     in function: InitializePlugins
154:     in file:     /scratch/workspace/build-larsoft/v06_60_00/SLF6/prof/build/pandora/v03_07_00/Linux64bit+2.6-2.12-e14-nu-prof/pandora-v03-07-00/PandoraSDK-v03-02-00/src/Managers/PluginManager.cc line#: 219
155: m_pPandoraImpl->InitializePlugins(&xmlHandle) throw STATUS_CODE_INVALID_PARAMETER
156:     in function: ReadSettings
157:     in file:     /scratch/workspace/build-larsoft/v06_60_00/SLF6/prof/build/pandora/v03_07_00/Linux64bit+2.6-2.12-e14-nu-prof/pandora-v03-07-00/PandoraSDK-v03-02-00/src/Pandora/Pandora.cc line#: 149
158: Failure in reading pandora settings, STATUS_CODE_INVALID_PARAMETER
159: PandoraApi::ReadSettings(*m_pPrimaryPandora, fullConfigFileName) throw STATUS_CODE_FAILURE
160:     in function: ConfigurePandoraInstances
161:     in file:     /scratch/workspace/lar_ci/label_exp/SLF6/label_exp2/swarm/LArSoft/srcs/larpandora/larpandora/LArPandoraInterface/StandardPandora_module.cc line#: 109
162: %MSG-e BeginJob:  StandardPandora:pandora@BeginJob 16-Dec-2017 11:13:01 CST  ModuleBeginJob
163: An unknown Exception occurred in
164: Module type=StandardPandora, Module label=pandora, Parameter Set ID=f0fdc57a300a73d5a3406b41d89f996575fc0ee2, Process name=Reco, Release Version=v2_08_04, Main Parameter Set ID=f3aef0aa34577436343317614302dbf5a4896d08
165: 
166: %MSG
167: 16-Dec-2017 11:13:01 CST  Closed input file "xroot://fndca1.fnal.gov:1094/pnfs/fnal.gov/usr/dune/persistent/users/vito/ci_tests_inputfiles/DUNE35T/detsim/AntiMuonCutEvents_LSU_v2_dune35t_detsim_Reference.root" 
168: 
169: ====================================================================================================================
170: TimeTracker printout (sec)            Min           Avg           Max         Median          RMS         nEvts   
171: ====================================================================================================================
172: [ No processed events ]
173: ====================================================================================================================
174: 
175: ====================================================================================================
176: MemoryTracker summary (base-10 MB units used)
177: 
178:   Peak virtual memory usage (VmPeak)  : 1680.31 MB
179:   Peak resident set size usage (VmHWM): 322.683 MB
180: ====================================================================================================
181: 
182: TrigReport ---------- Event  Summary ------------
183: TrigReport Events total = 0 passed = 0 failed = 0
184: 
185: TrigReport ------ Modules in End-Path: end_path ------------
186: TrigReport  Trig Bit#        Run    Success      Error Name
187: TrigReport     0    0          0          0          0 out1
188: 
189: TimeReport ---------- Time  Summary ---[sec]----
190: TimeReport CPU = 0.082988 Real = 0.176515
191: 
192: MemReport  ---------- Memory  Summary ---[base-10 MB]----
193: MemReport  VmPeak = 1680.33 VmHWM = 322.982
194: 
195: MultiPandoraApiImpl::DeletePandoraInstances - unable to find daughter instances associated with primary 0
196: %MSG-s ArtException:  PostEndJob 16-Dec-2017 11:13:02 CST ModuleEndJob
197: cet::exception caught in art
198: ---- OtherArt BEGIN
199:   ---- Unknown BEGIN
200:     An unknown Exception occurred in
201:     Module type=StandardPandora, Module label=pandora, Parameter Set ID=f0fdc57a300a73d5a3406b41d89f996575fc0ee2, Process name=Reco, Release Version=v2_08_04, Main Parameter Set ID=f3aef0aa34577436343317614302dbf5a4896d08
202:   ---- Unknown END
203: ---- OtherArt END
204: %MSG
205: Art has completed and will exit with status 1.

#2 Updated by Christoph Alt over 1 year ago

I quote Tom's email here:


Hi Christoph, David,

Looks like the dune35t_particlestitcher table was removed in this commit

https://cdcvs.fnal.gov/redmine/projects/dunetpc/repository/revisions/426486a9ada8311aa8e16e8f7dc2d40a8552c394

by John (cc’d, and added to the watch list).  He updated standard_reco_dune35tsim.fcl but not these others:

I looked up instances of dune35t_particlestitcher in dunetpc v06_55_01:

/cvmfs/dune.opensciencegrid.org/products/dune/dunetpc/v06_55_01/job/pandoramodules_dune.fcl    — definition, now gone
/cvmfs/dune.opensciencegrid.org/products/dune/dunetpc/v06_55_01/job/reco_dune35tsim_blur.fcl
/cvmfs/dune.opensciencegrid.org/products/dune/dunetpc/v06_55_01/job/reco_dune35tsim_emhits.fcl
/cvmfs/dune.opensciencegrid.org/products/dune/dunetpc/v06_55_01/job/standard_reco_dune35tdata.fcl
/cvmfs/dune.opensciencegrid.org/products/dune/dunetpc/v06_55_01/job/standard_reco_dune35tdata_fasthit.fcl
/cvmfs/dune.opensciencegrid.org/products/dune/dunetpc/v06_55_01/job/standard_reco_dune35tsim.fcl  — updated by John

  Tom

I also add the Pandora team to the watchlist (as requested by John last week).

#3 Updated by John Marshall over 1 year ago

Hi,

Firstly, to clarify the specific issue about the dune35t_particlestitcher: this was rendered redundant in the recent Pandora changes, described in the LArSoft coordination meeting on December 5. The stitching procedure is now explicitly performed as a step within the Pandora pattern recognition.

Secondly, the (wider) issue about the associated .fcl footprint: there is now a large number of .fcl files, distributed across uboonecode and dunetpc (and sbndcode now too), that use/steer the Pandora pattern recognition. The Pandora team is not familiar with many of these and so, for this release, we explicitly asked for help from the maintainers of dunetpc and uboonecode to check a proposed, minimal set of .fcl file changes in named feature branches.

The slides associated with this request are linked below:
https://indico.fnal.gov/event/15868/contribution/3/material/slides/0.pdf

In advance of this meeting, a number of people (but not an exhaustive list) were also contacted directly by e-mail to warn about this request:

[Description of changes] The downside of doing this is that there is a significant footprint in the reconstruction .fcl files. For dunetpc it is nowhere near as involved as for uboonecode, but in tomorrow's meeting we would like to ask you (as maintainers of dunetpc) to look over the proposed changes in the feature branch feature/larpandoracontent_v03_09_00 and confirm you are OK with them, and provide any further instructions about e.g. whether there are other .fcl files to which the same/similar changes should be applied.

From the Pandora side, it looks like we should have tried even harder to make sure that everyone was able to check over the .fcl footprint. We did receive a named list of .fcl files to change from uboonecode and from ProtoDUNE, and this was helpful and appreciated. From the dunetpc side, some detailed response would clearly have helped, but timescales were, of course, being driven by the needs of the ProtoDUNE MCC10 production.

#4 Updated by Thomas Junk over 1 year ago

Sadly, I was trapped in another meeting on Dec. 5 and had to go on your slides and e-mails.

"footprint" is not the right word -- it indicates the size something takes up, and does not on its face imply maintenance. "maintenance" is the right word, or "necessary changes". I looked briefly at the changes to dune_pandora.fcl in the feature branch but did not realize all of the changes that would be needed to other fcl files. Some more explicit guidance of what changes are necessary would be of great use when making a breaking change moving forwards.

We can make those changes but we want to reduce the amount of guesswork required to catch up with breaking changes, and it takes time to do it.

Tom

#5 Updated by Thomas Junk over 1 year ago

I looked for "particlestitcher" in dunetpc's fcl files and followe John's example in job/standard_reco_dune35tsim.fcl: remove all modules labeled "particlestitcher" and "particlestitcherdc", their configurations, and replace "particlstitcher" with "pandoraTrack" and "particlestitcherdc" with "pandoraTrackdc" for fcl parameters describing input module labels. This should get the cases where the data products are produced and consumed in the same job. I updated these files:

modified:   fcl/dune35t/mergeana/standard_ana_dune35t.fcl
modified: fcl/dune35t/mergeana/standard_ana_dune35t_data.fcl
modified: fcl/dune35t/reco/reco_dune35tsim_blur.fcl
modified: fcl/dune35t/reco/reco_dune35tsim_emhits.fcl
modified: fcl/dune35t/reco/standard_reco_dune35tdata.fcl
modified: fcl/dune35t/reco/standard_reco_dune35tdata_fasthit.fcl
modified: fcl/dune35t/reco/standard_reco_dune35tdata_robusthit.fcl
modified: fcl/dune35t/reco/standard_reco_dune35tsim_robusthit.fcl

and will see what the CI tests do.

There are four fcl files mentioning particlestitcher still, but they appear to read these tracks from files and do not appear to produce them.

dune/CTree/ctree35t.fcl: TrackModuleLabel: "particlestitcher"
dune/CTree/ctreeraw35t_trigHM.fcl: TrackModuleLabel: "particlestitcher"
dune/NearlineMonitor/evd/ctreeraw35t_trigTPC.fcl: TrackModuleLabel: "particlestitcher"
dune/NearlineMonitor/evd/ctreeraw35t_trigTPC_cwpfilter.fcl: TrackModuleLabel: "particlestitcher"

This code is on its way to obsolescence, but we should keep it around for inspection if need be.

#6 Updated by Thomas Junk over 1 year ago

Unfortunately it's still broken, with the "unknown exception" Christoph posted earlier. If I use the fcl file John edited for us

lar -n 1 -c standard_reco_dune35tsim.fcl /pnfs/dune/persistent/users/vito/ci_tests_inputfiles/DUNE35T/detsim/AntiMuonCutEvents_LSU_v2_dune35t_detsim_Reference.root

I get

LArRotationalTransformationPlugin::Initialize - Plugin does not support provided LArTPC configurations
m_pLArTransformationPlugin->Initialize() return STATUS_CODE_INVALID_PARAMETER
in function: InitializePlugins
in file: /scratch/workspace/build-larsoft/v06_60_00/SLF6/prof/build/pandora/v03_07_00/Linux64bit+2.6-2.12-e14-nu-prof/pandora-v03-07-00/PandoraSDK-v03-02-00/src/Managers/PluginManager.cc line#: 219
m_pPandoraImpl->InitializePlugins(&xmlHandle) throw STATUS_CODE_INVALID_PARAMETER
in function: ReadSettings
in file: /scratch/workspace/build-larsoft/v06_60_00/SLF6/prof/build/pandora/v03_07_00/Linux64bit+2.6-2.12-e14-nu-prof/pandora-v03-07-00/PandoraSDK-v03-02-00/src/Pandora/Pandora.cc line#: 149
Failure in reading pandora settings, STATUS_CODE_INVALID_PARAMETER
PandoraApi::ReadSettings(*m_pPrimaryPandora, fullConfigFileName) throw STATUS_CODE_FAILURE
in function: ConfigurePandoraInstances
in file: /scratch/workspace/build-larsoft/v06_60_00/SLF6/prof/build/larpandora/v06_18_00/src/larpandora/LArPandoraInterface/StandardPandora_module.cc line#: 109
%MSG-e BeginJob: StandardPandora:pandora@BeginJob 18-Dec-2017 16:28:32 CST ModuleBeginJob
An unknown Exception occurred in
Module type=StandardPandora, Module label=pandora, Parameter Set ID=f0fdc57a300a73d5a3406b41d89f996575fc0ee2, Process name=Reco, Release Version=v2_08_04, Main Parameter Set ID=c73966792610d509e3c29438cbc891d1080f4f11

%MSG
%MSG-e BeginJob: StandardPandora:pandora@BeginJob 18-Dec-2017 16:28:32 CST ModuleBeginJob
An unknown Exception occurred in
Module type=StandardPandora, Module label=pandora, Parameter Set ID=f0fdc57a300a73d5a3406b41d89f996575fc0ee2, Process name=Reco, Release Version=v2_08_04, Main Parameter Set ID=c73966792610d509e3c29438cbc891d1080f4f11

%MSG

#7 Updated by Thomas Junk over 1 year ago

I think I see something close to the source of the problem, found with a debugger, using dunetpc v06_60_00_01, which depends on larsoft v06_60_00. In

/cvmfs/fermilab.opensciencegrid.org/products/larsoft/larpandoracontent/v03_09_00/source/larpandoracontent/LArPlugins/LArRotationalTransformationPlugin.cc

line 266 is a test of the U, V, and sigmaUVW angles. This gets called several times when running

lar -n 1 -c standard_reco_dune35tsim.fcl /pnfs/dune/persistent/users/vito/ci_tests_inputfiles/DUNE35T/detsim/AntiMuonCutEvents_LSU_v2_dune35t_detsim_Reference.root

m_thetaU is 0.772727072
m_thetaV is 0.797702730

as expected for the 35t which has slightly different U and V angles. The first four trips through this method return STATUS_CODE_SUCCESS, but on the fifth time around, pLArTPC->GetWireAngleV() gives 0.772727072 and pLArTPC->GetWireAngleU() gives 0.79770273, which is backwards.

#8 Updated by John Marshall over 1 year ago

Hi,

Thanks for looking into this. We’ve just had a quick meeting to discuss.

The reported error is not due to an implementation issue, but a matter of our chosen reconstruction strategy not being directly applicable to DUNE35t. This is because the wire angles differ between u/v and the LArRotationalTransformationPlugin very strictly (but correctly) identifies that it is not designed for such a use case. Without a registered plugin to provide coordinate transformations e.g. u, v -> w then the patrec cannot proceed.

There’s more details below, but this motivates two alternative ways of moving forwards:

1. We have tested and pushed a larpandoracontent branch "feature/LArRotationalTransformationPlugin" to Redmine that will allow the DUNE35t reconstruction to proceed, with the understanding that the transformations will only be approximations in some drift volumes. The diff should read cleanly, and will show that the maximum allowed difference between like wire (u and v) angles between volumes is now configurable and carries a larger default value.

or

2. We decide that Pandora should no longer be included in the DUNE35t reconstruction, as the Pandora team does not have sufficient person power to maintain it. It is not an experiment that Pandora has targeted directly, but has been able to use developments put together for e.g. MicroBooNE. Our routine testing considers tens of thousands of MicroBooNE events nightly, about ten thousand ProtoDUNE events upon demand and a few hundred DUNEFD events upon demand. It does not include DUNE35t (more detail about our tests, with Travis CI, Coverity, codecov and Valgrind are available upon request).

Please let us know which you prefer.

As a bit more background information about the overall reconstruction strategy:

The strategy is to give all input 2D hits to a master Pandora instance. The wire angles for alternate drift volumes are interchanged (as you spotted), and the u, v hit types are also interchanged between volumes so that there are no sudden discontinuities in the u and v input “images” that are examined in the patrec stage. The LArRotationalTransformationPlugin will check the wire angles, as discussed above.

Within the master Pandora instance, hits in individual drift volumes are given to Pandora worker instances, with one worker instance per drift volume. Each drift volume is then reconstructed individually and the resulting particles are retrieved by the master instance. The master instance then examines the 3D spacepoints and performs “particle stitching” between volumes as required.

The reconstruction can then, typically, proceed with cosmic-ray tagging and hit removal steps, then event slicing, which identifies regions of 3D space/hits that represent individual, discrete interactions. The cosmic-ray and neutrino/beam outcomes are then evaluated for each slice, before a final, consolidated event is delivered. These remaining steps all work in the context of a spoofed, global drift volume.

Our strategy therefore "relies” on equal/opposite u and v wire angles to the vertical, but the DUNE35t detector is not so far removed from this, and we could proceed with strategy 1, above, if desired.

We hope this helps.

#9 Updated by Thomas Junk over 1 year ago

Thanks for the update. We'll bring this up at the Software meeting tomorrow.

#10 Updated by Thomas Junk over 1 year ago

Looks like the consensus is to drop pandora reconstruction for 35-ton. It wasn't used for publication analyses, and having a confusing pandora reco that is different for some drift volumes from others is less good than no pandora reco for 35-ton. I will modify fcl files to remove it, and it looks like pmtrajfit and similar modules also have to go as well as they depend on pandora output.

#11 Updated by David Adams over 1 year ago

Tom, can we close this now?

#12 Updated by Thomas Junk over 1 year ago

Yes, I think we addressed this. Thanks!

#13 Updated by David Adams over 1 year ago

  • Status changed from New to Closed


Also available in: Atom PDF