Project

General

Profile

Bug #23013

Unable to read recob::Vertex position information

Added by Christopher Backhouse about 2 months ago. Updated about 1 month ago.

Status:
Work in progress
Priority:
Urgent
Category:
Data products
Target version:
-
Start date:
07/30/2019
Due date:
% Done:

0%

Estimated time:
56.00 h
Spent time:
Occurs In:
Experiment:
DUNE
Co-Assignees:
Duration:

Description

I'm trying to read recob::Vertex objects out of an MCC11 file using a checkout based on v08_27_00. The position() comes back all zeros, even though I can see reasonable values if I look in the file directly with a TBrowser. Do I need to use some old release to read the MCC11 files properly? Which one?

Presumably something was changed in how this type stores its data. Is it possible to do this in a backwards compatible way, or add some rules to help the dictionary generation make the translation?

In the short time, could I potentially check RecoBase out and make some edit there to match how it was in MCC11?

dump_vertices.fcl (1.76 KB) dump_vertices.fcl Configuration: dump vertex position of "trajcluster" data product Gianluca Petrillo, 08/08/2019 06:39 PM
DumpVertices.log (1.71 KB) DumpVertices.log Expected output Gianluca Petrillo, 08/08/2019 06:46 PM

Associated revisions

Revision 82bbb16e (diff)
Added by Chris Green about 1 month ago

Per Philippe C., workaround for #23013.

Revision 809f9303 (diff)
Added by Chris Green about 1 month ago

Revert "Per Philippe C., workaround for #23013" as too flaky.

This reverts commit 82bbb16ead8e0adea0295a197b6dbdc3812e9f0b.

Revision cb28aef5 (diff)
Added by Chris Green about 12 hours ago

Fix for #23013 in conjunction with ROOT 6.18/04.

History

#1 Updated by Gianluca Petrillo about 2 months ago

Can you detail what is MCC11 in terms of LArSoft/dunetpc version, the qualifiers you are using (just in case) and provide a pointer to a DUNE input file to test the issue with?

#2 Updated by Christopher Backhouse about 2 months ago

Here is an example of the files I'm trying to use
/pnfs/dune/tape_backed/dunepro/mcc11/protodune/mc/full-reconstructed/07/99/40/36/nu_dune10kt_1x2x6_13036856_0_20181111T180801_gen_g4_detsim_reco.root

I don't know the actual versions that were used, and the SAM metadata is not helpful.

#3 Updated by Christopher Backhouse about 2 months ago

I set things up with -qe17:prof

#4 Updated by Tingjun Yang about 2 months ago

Metadata shows the file was produced with larsoft v07_06_02.

#5 Updated by Christopher Backhouse about 2 months ago

I'm trying to run with v07_06_02 and failing with

---- LogicError BEGIN
  checkDictionaries: Retrieving list of base classes for type 'art::Assns<recob::PFParticle,larpandoraobj::PFParticleMetadata,void>' returned a nullptr.
---- LogicError END

#6 Updated by Christopher Backhouse about 1 month ago

Is there anything else I can add here? Can you reproduce the problem?

I see that the vertex position is stored with this special Double32_t type, which presumably has something to do with this. But track trajectories are stored the same way, and for some reason I am able to access those.

I tried adding lardataobj_RecoBase_dict to my MODULE_LIBRARIES, but no change.

#7 Updated by Christopher Green about 1 month ago

  • Assignee set to Christopher Green
  • Status changed from New to Feedback
  • Category set to Data products
  • Tracker changed from Bug to Support

Please let us know exactly which packages you have checked out in your local MRB area.

It's looking very likely that you have an inconsistent build. In order to have a consistent build:
  1. Any packages you check out should be based on the release you're building against.
  2. Check out only those packages for which you need to change code and anything that depends on them that will be in the final job as configured (including plugins).
  3. Use ups depend -B to examine the dependency tree of the highest level package whose code you're using (dunetpc, for example) and check out any packages that are between the packages you have already checked out.
    As you can see, this is not trivial. In general, avoid checking out low level packages unnecessarily. Simply put, pre-compiled packages you are using from the central release should not be link-dependent on anything you have checked out.

#8 Updated by Christopher Backhouse about 1 month ago

The two combinations I have tried are dunetpc v08_17_00 plus larreco v08_12_01 and dunetpc v08_27_00 plus larreco v08_16_01.

I'm working with larreco and I need dunetpc just for its fcls (I suppose I could just have set up the product, but it should come to the same thing).

What exactly is the syntax for ups depend? I just get ERROR: Found no match for product 'dunetpc'

#9 Updated by Christopher Green about 1 month ago

ups depend -B dunetpc <version> [-q <quals>]

For dunetpc v08_27_00 -q+e17:+prof, The issue is that the dunetpc product depends on the larsoft umbrella product rather than the actually products it needs for linking purposes. In addition to dunetpc and larreco then, you need:
  • larsoft
  • lareventdisplay
  • larana
  • larpandora

Note that if you set up any other products besides the ones you have mentioned (e.g. for plugins), you will also need to examine their dependencies in the same way.

#10 Updated by Christopher Backhouse about 1 month ago

This seems crazy, and I couldn't get it to work, since I wound up with some version conflict with larg4.

If I can make it so I only need larreco, can I work with only that in my local release?

#11 Updated by Christopher Green about 1 month ago

Your MRB local-products area needs to be empty (mrb zi or mrb zd). Once you have done this, you should start over from a fresh login window to avoid hysteresis.

If your checked-out packages are based on the version consistent with the dunetpc version, the packages I have suggested should be sufficient. If any of the packages you have checked out are based on develop, then all bets are off: you need a full checkout of the entire distribution.

#12 Updated by Christopher Green about 1 month ago

Addendum to answer your other question: I would be surprised if you could "make it so you only need" larreco checked out locally, since that would imply that you weren't using any plugins that used larreco in your invocation of the lar (or art) executable.

#13 Updated by Gianluca Petrillo about 1 month ago

I suspect the issue is deeper than that, Chris.
Steps to reproduce:

source /cvmfs/dune.opensciencegrid.org/products/dune/setup_dune.sh
setup dunetpc v08_27_00 -q e17:prof
lar -c dump_vertices.fcl -s /pnfs/dune/tape_backed/dunepro/mcc11/protodune/mc/full-reconstructed/07/99/40/36/nu_dune10kt_1x2x6_13036856_0_20181111T180801_gen_g4_detsim_reco.root -n 1

Result:
$ cat DumpVertices.log
Event run: 20000001 subRun: 0 event: 1 contains 37 vertices from 'trajcluster'

    [#0] ID=1 at (0,0,0)
    [#1] ID=6 at (0,0,0)
    [#2] ID=7 at (0,0,0)
    [#3] ID=8 at (0,0,0)
    [#4] ID=9 at (0,0,0)
    [#5] ID=10 at (0,0,0)
    [#6] ID=11 at (0,0,0)
    [#7] ID=12 at (0,0,0)
    [#8] ID=13 at (0,0,0)
    [#9] ID=14 at (0,0,0)
    [#10] ID=15 at (0,0,0)
    [#11] ID=16 at (0,0,0)
    [#12] ID=17 at (0,0,0)
    [#13] ID=18 at (0,0,0)
    [#14] ID=19 at (0,0,0)
    [#15] ID=20 at (0,0,0)
    [#16] ID=21 at (0,0,0)
    [#17] ID=22 at (0,0,0)
    [#18] ID=23 at (0,0,0)
    [#19] ID=24 at (0,0,0)
    [#20] ID=25 at (0,0,0)
    [#21] ID=26 at (0,0,0)
    [#22] ID=27 at (0,0,0)
    [#23] ID=28 at (0,0,0)
    [#24] ID=29 at (0,0,0)
    [#25] ID=30 at (0,0,0)
    [#26] ID=31 at (0,0,0)
    [#27] ID=32 at (0,0,0)
    [#28] ID=33 at (0,0,0)
    [#29] ID=34 at (0,0,0)
    [#30] ID=35 at (0,0,0)
    [#31] ID=36 at (0,0,0)
    [#32] ID=37 at (0,0,0)
    [#33] ID=38 at (0,0,0)
    [#34] ID=39 at (0,0,0)
    [#35] ID=40 at (0,0,0)
    [#36] ID=41 at (0,0,0)


In this case, there is not even MRB involved.
I too have verified that the distribution of vertices from that branch is not all at x = 0.

The configuration file I used is attached to this ticket.
And, if you have the proper credentials, the right command is actually:

lar -c dump_vertices.fcl -s root://fndca1.fnal.gov:1094/pnfs/fnal.gov/usr/dune/tape_backed/dunepro/mcc11/protodune/mc/full-reconstructed/07/99/40/36/nu_dune10kt_1x2x6_13036856_0_20181111T180801_gen_g4_detsim_reco.root -n 1

#14 Updated by Gianluca Petrillo about 1 month ago

For reference, I attach and copy the output obtained with dunetpc v07_06_02 (e17:prof):

Event run: 20000001 subRun: 0 event: 1 contains 37 vertices from 'trajcluster'

    [#0] ID=1 at (181.339,-407.52,604.953)
    [#1] ID=6 at (177.084,-426.468,631.591)
    [#2] ID=7 at (157.644,-421.925,644.045)
    [#3] ID=8 at (162.373,-453.244,666.558)
    [#4] ID=9 at (165.406,-444.713,642.608)
    [#5] ID=10 at (163.353,-448.717,643.762)
    [#6] ID=11 at (179.386,-388.078,667.995)
    [#7] ID=12 at (160.374,-416.741,631.112)
    [#8] ID=13 at (219.356,-387.27,567.405)
    [#9] ID=14 at (25.8937,-116.629,667.516)
    [#10] ID=15 at (140.238,-430.466,648.835)
    [#11] ID=16 at (252.876,-428.053,622.011)
    [#12] ID=17 at (146.928,-425.929,653.246)
    [#13] ID=18 at (93.0671,-447.119,666.753)
    [#14] ID=19 at (174.66,-427.128,617.323)
    [#15] ID=20 at (153.53,-415.134,625.944)
    [#16] ID=21 at (173.863,-425.794,621.537)
    [#17] ID=22 at (137.551,-478.699,672.786)
    [#18] ID=23 at (162.874,-411.81,622.011)
    [#19] ID=24 at (154.099,-469.379,653.625)
    [#20] ID=25 at (85.6819,-410.342,679.97)
    [#21] ID=26 at (144.471,-431.398,669.432)
    [#22] ID=27 at (147.425,-311.718,663.684)
    [#23] ID=28 at (153.879,-429.527,643.188)
    [#24] ID=29 at (175.92,-422.338,625.364)
    [#25] ID=30 at (155.99,-462.71,653.821)
    [#26] ID=31 at (178.804,-508.16,606.204)
    [#27] ID=32 at (144.926,-429.928,672.789)
    [#28] ID=33 at (166.606,-384.345,644.045)
    [#29] ID=34 at (196.917,-476.711,570.758)
    [#30] ID=35 at (246.422,-379.552,603.815)
    [#31] ID=36 at (262.637,-367.41,512.32)
    [#32] ID=37 at (147.528,-414.735,634.278)
    [#33] ID=38 at (182.109,-417.133,614.736)
    [#34] ID=39 at (206.929,-318.773,801.711)
    [#35] ID=40 at (137.611,-340.223,730.34)
    [#36] ID=41 at (234.69,-229.216,985.243)

#15 Updated by Christopher Green about 1 month ago

  • Estimated time set to 32.00 h
  • Priority changed from Normal to Urgent
  • Status changed from Feedback to Work in progress
  • Tracker changed from Support to Bug

Now we have a straightforward reproducer using a fully-precompiled release (many thanks, Gianluca), I can confirm that this problem appeared between dunetpc v08_15_01 (last known good) and dunetpc v08_17_00 (first known bad). From a first look, no smoking gun in LArSoft or DUNE code can be identified, but between these two versions ROOT was upgraded from v6_12_06 to v6_16_00. From a discussion with Philippe, I learned that there were many changes in IO rule handling code in ROOT between those two versions, and many problems introduced in 6.16/00 were fixed in 6.18/00.

I will work on attempting to reproduce the problem without LArSoft code / DUNE data to make characterizing the problem with different ROOT versions easier.

#16 Updated by Christopher Green about 1 month ago

Problem has been reproduced in vitro: please see the ROOT JIRA issue. Unfortunately, the bug is still present at the HEAD of ROOT's v6-16-00-patches, v6-18-00-patches, and master branches.

#17 Updated by Christopher Green about 1 month ago

  • Estimated time changed from 32.00 h to 56.00 h

#18 Updated by Christopher Green about 1 month ago

  • % Done changed from 0 to 100
  • Status changed from Work in progress to Resolved

The underlying ROOT bug is still being investigated. However, based on input from Philippe there is a workaround specifically for the case where my_template<double> becomes my_template<Double32_t>, because the representations are identical in memory. This workaround has been implemented with lardataobj:82bbb16, which should be part of this week's LArSoft release.

I will mark this issue resolved pending post-release confirmation from you. To follow the fix for the underlying ROOT problem, please see the ROOT JIRA ticket.

#19 Updated by Christopher Green about 1 month ago

  • % Done changed from 100 to 0
  • Status changed from Resolved to Work in progress

Upon further investigation, it appears that this workaround does not succeed in the cases we care about: reverting the "fix." At this stage we are likely to have to wait for a full fix from the ROOT team.

Apologies for the false hope.



Also available in: Atom PDF