Project

General

Profile

Bug #12424

The output of LArPandoraOutput module is not reproducible

Added by Saba Sehrish over 3 years ago. Updated over 3 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
External Packages
Target version:
-
Start date:
04/29/2016
Due date:
% Done:

0%

Estimated time:
Spent time:
Occurs In:
Experiment:
-
Co-Assignees:
Duration:

Description

I am using LArPandoraOutput module to study the use of util::CreateAssn.
I am using a cropped version of uboone reco stage 1, config file is attached for reference.

Two consecutive runs on the same input and the same configuration file results in different out put; I am using dumper module (from the lardata feature/gp_RecoBaseDumpers) to print the data products form the pandora output and diff to compare it.
Some of the difference may appear to be rounding errors, but I expect if I run the same module twice with the same configuration and same input file I get the same results.

I am running on Mac (Yosemite), and using LArSoft v05_10_00. Gianluca ran the same version of LArSoft, input file, and configuration on slf6 and reported the same behavior.

reco_uboone_pandora.fcl (728 Bytes) reco_uboone_pandora.fcl Saba Sehrish, 04/29/2016 03:16 PM
reco_uboone_pandora.fcl (728 Bytes) reco_uboone_pandora.fcl runs pandora in MicroBooNE MCC7 production configuration Gianluca Petrillo, 05/02/2016 04:39 PM
dump_pfparticles_pandoraCosmic.fcl (570 Bytes) dump_pfparticles_pandoraCosmic.fcl dumps particle flow object content on a log file Gianluca Petrillo, 05/02/2016 04:40 PM

History

#1 Updated by Gianluca Petrillo over 3 years ago

A first, quick analysis of the dump files printed in full precision and base 16 suggests that:

  • there are extensive conversions from float to double, resulting in double precision numbers with the least significant bits set to 0 (for brevity, I'll refer to these values as 0-rounded)
  • differences in "primary" quantities are due to (seemingly) rounding errors that can affect 8 bits or more; I have observed them in:
    • in coordinates of space points
    • in decay vertex position
    • in track position and trajectory points (may affect only a few of the trajectory points)
    • in seed coordinates and directions
  • all these quantities are 0-rounded, and the rounding errors are visible to the least significant, non-zeroed mantissa bits
  • I have noticed no differences in double precision values that have full precision (that is, whose least significant mantissa bits are not all set to 0) and are not derivate from other values that are unaffected by the problem. For example, I have seen differences in track length (full 64 bit precision), but those can be blamed on differences in the trajectory points they are computed from, and those trajectory points are 0-rounded
  • most of the values are not affected, but a sizable amount are

#3 Updated by Gianluca Petrillo over 3 years ago

From an empty LArSoft area with basic setup:

cd "$MRB_SOURCE" 
mrb gitCheckout lardata
mrb gitCheckout larreco
cd "${MRB_SOURCE}/lardata" 
git checkout feature/gp_RecoBaseDumpers
cd "${MRB_SOURCE}/larreco" 
git checkout feature/gp_RecoBaseDumpers
cd "$MRB_TOP" 
mrbsetenv
mrb install -j4
mkdir -p "${MRB_TOP}/job" 
curl 'https://cdcvs.fnal.gov/redmine/attachments/download/34324/reco_uboone_pandora.fcl' > "${MRB_TOP}/job/reco_uboone_pandora.fcl" 
mkdir -p "${MRB_TOP}/job" 
curl 'https://cdcvs.fnal.gov/redmine/attachments/download/34325/dump_pfparticles_pandoraCosmic.fcl' > "${MRB_TOP}/job/dump_pfparticles_pandoraCosmic.fcl" 
mkdir -p "${MRB_TOP}/input" 
scp uboonegpvm02.fnal.gov:/nashome/p/petrillo/MicroBooNE/data/LArSoft/develop/prof/input/hits/cosmic/prodcosmics_corsika_cmc_uboone_20160421T132938_gen_20160421T134852_g4_20160421T135413_detsim_20160421T140250_reco1.root "${MRB_TOP}/input" 
setup uboonecode v05_10_00 -q e9:prof
mkdir -p "${MRB_TOP}/tests/1" 
cd "${MRB_TOP}/tests/1" 
lar -c "${MRB_TOP}/job/reco_uboone_pandora.fcl" -s "${MRB_TOP}/input/prodcosmics_corsika_cmc_uboone_20160421T132938_gen_20160421T134852_g4_20160421T135413_detsim_20160421T140250_reco1.root" -n 2 >& reco_uboone_pandora.out &
mkdir -p "${MRB_TOP}/tests/2" 
cd "${MRB_TOP}/tests/2" 
lar -c "${MRB_TOP}/job/reco_uboone_pandora.fcl" -s "${MRB_TOP}/input/prodcosmics_corsika_cmc_uboone_20160421T132938_gen_20160421T134852_g4_20160421T135413_detsim_20160421T140250_reco1.root" -n 2 >& reco_uboone_pandora.out &
wait
cd "${MRB_TOP}/tests/1" 
lar -c "${MRB_TOP}/job/dump_pfparticles_pandoraCosmic.fcl" -s prodcosmics_corsika_cmc_uboone_20160421T132938_gen_20160421T134852_g4_20160421T135413_detsim_20160421T140250_reco1_*_reco1.root -n 2  >& dump_pfparticles_pandoraCosmic.out &
mkdir -p "${MRB_TOP}/tests/2" 
cd "${MRB_TOP}/tests/2" 
lar -c "${MRB_TOP}/job/dump_pfparticles_pandoraCosmic.fcl" -s prodcosmics_corsika_cmc_uboone_20160421T132938_gen_20160421T134852_g4_20160421T135413_detsim_20160421T140250_reco1_*_reco1.root -n 2  >& dump_pfparticles_pandoraCosmic.out &
wait
cd "$MRB_TOP" 
diff tests/{1,2}/DumpPFParticles.log

Tested on uboonegpvm07.fnal.gov (compilation takes forever; this should also work on uboonebuild.fnal.gov).
This uses a 1.7 GB input file in my working area from uboonegpvm02 - any file with gaushit hits will do.
Description:

  1. use feature/gp_RecoBaseDumpers from lardata and larreco to get access to the dumpers
  2. copy the FHiCL configuration files attached to this issue
  3. copy an input file from somewhere (here, the file mentioned above)
  4. run in parallel twice pandora with a MicroBooNE MCC7 configuration
  5. run the dumper on the two produced files
  6. run diff on them

#4 Updated by John Marshall over 3 years ago

Many thanks for the comprehensive instructions, which were exactly what I was looking for.

I have two pieces of (what should be) good news relating to this issue:

1. These tests showed no differences in what we call the "core" pattern recognition i.e. the assignment of 2D hits to clusters, creation of pfparticles and the pfparticle hierarchy. Instead, the differences appeared as extremely small "jitter" of the 3D spacepoints and the (recently added) output seed properties. Our previous audit of reproducibility issues concentrated on the core output, although assessment of the 3D spacepoints etc. was drawing to completion when we needed to get tag v02-07-01 (included in LArSoft v05_10_00) out of the door (see point 2, below).

2. The differences, with this test sample, seem to vanish when testing with the latest version of the larpandoracontent library, available via https://github.com/PandoraPFA/LArContent.git I would appreciate it if you were able to independently confirm that the issues vanish with this up-to-date version (tip of the master branch; 6f3ff58). I strongly suspect that floating point handling changes made to the lar_content::TwoDSlidingFitResult implementation on April 19 and a spacepoint creation tool change on April 22 were instrumental (just missed tag v02-07-01, but implemented as the most recent part of the same reproducibility push).

Thanks again for highlighting this issue. I hope that you can confirm what I say above. We can then make a new larpandoracontent release.

Best wishes,

John

--

P.S. It seems quite fiddly to include a new version of larpandoracontent in a local test release. A brief overview of a possible approach is provided below. I'd also really appreciate any tips about how to streamline this procedure:

# Assuming a local test release, based-off v05_10_00 -q e9:prof with correctly set-up local products

# Follow the Pandora build instructions, ideally as documented at
https://github.com/PandoraPFA/Documentation/blob/master/Pandora_Build_Instructions.txt

# Sadly none of the recommended approaches sits particularly nicely with the larsoft distribution stategy... Recommendation:
git clone https://github.com/PandoraPFA/PandoraPFA $MY_DIR/Pandora/Linux64bit+2.6-2.12-e9-nu-r5-prof
cd $MY_DIR/Pandora
ln -s Linux64bit+2.6-2.12-e9-nu-r5-prof/include ./include
# provide a pandora.table file in Pandora/ups
mkdir Linux64bit+2.6-2.12-e9-nu-r5-prof/build
cd Linux64bit+2.6-2.12-e9-nu-r5-prof/build
cmake -DCMAKE_MODULE_PATH=$ROOTSYS/etc/cmake -DPANDORA_MONITORING=ON ..
make -j4 install

git clone https://github.com/PandoraPFA/LArContent $MY_DIR/LArPandoraContent/Linux64bit+2.6-2.12-e9-nu-r5-prof
cd $MY_DIR/LArPandoraContent/
ln -s Linux64bit+2.6-2.12-e9-nu-r5-prof/include ./include
# provide a larpandoracontent.table file in LArPandoraContent/ups
mkdir Linux64bit+2.6-2.12-e9-nu-r5-prof/build
cd Linux64bit+2.6-2.12-e9-nu-r5-prof/build
cmake -DCMAKE_MODULE_PATH="$MY_PANDORA_PATH/Linux64bit+2.6-2.12-e9-nu-r5-prof/cmakemodules;$ROOTSYS/etc/cmake" -DPANDORA_MONITORING=ON -DPandoraSDK_DIR=$MY_PANDORA_PATH/Linux64bit+2.6-2.12-e9-nu-r5-prof/ -DPandoraMonitoring_DIR=$MY_PANDORA_PATH/Linux64bit+2.6-2.12-e9-nu-r5-prof/ -DLAR_CONTENT_LIBRARY_NAME=LArPandoraContent  ..
make -j4 install

# Now need to package-up the builds so that it "looks like" a ups product, e.g. following directories inside pandora and larpandoracontent directories:
include,  Linux64bit+2.6-2.12-e9-nu-r5-prof,  ups

# You technically only need to build larpandoracontent, but I chose to switch to local versions of all Pandora-related packages for clarity.

# One-time declaration of the new builds to ups; unless you checked out a tag above, the version numbers chosen below are ill-defined
unsetup pandora
unsetup larpandoracontent
ups declare "pandora" "v02_08_01" -f "Linux64bit+2.6-2.12" -q "e9:nu:r5:prof" -r $MY_PANDORA_PATH -m $MY_PANDORA_PATH/ups/pandora.table
ups declare "larpandoracontent" "v02_07_02" -f "Linux64bit+2.6-2.12" -q "e9:nu:r5:prof" -r $MY_LARPANDORACONTENT_PATH -m $MY_LARPANDORACONTENT_PATH/ups/larpandoracontent.table
setup pandora v02_08_01 -q e9:nu:r5:prof
setup larpandoracontent v02_07_02 -q e9:nu:r5:prof

# Now add larpandoracontent v02_07_02 to product_deps for a local build of larpandora and proceed to test.

#5 Updated by Lynn Garren over 3 years ago

OK John, we're making a larpandoracontent redmine project. The actual LArContent code is included at larpandoracontent/larpandoracontent/LArContent. The first go round is a bit kludgy and I'm not sure how it will all play out, but we'll do it this way for larsoft v05_11_00 at least.

#6 Updated by Lynn Garren over 3 years ago

  • Status changed from New to Feedback

Please see if the new larpandoracontent v02_07_02 resolves the problems.

#7 Updated by Saba Sehrish over 3 years ago

I am using LArSoft v05_11_01 and uboonecode v05_11_01, the larpandoracontent is set up.
larpandoracontent v02_07_03 -f Darwin64bit+14 -q e9:prof:r5 -z /Users/ssehrish/Products

I see no difference between the output file of two consecutive runs using the same configuration and dumpers.

#8 Updated by Saba Sehrish over 3 years ago

This issue can be marked as resolved.

#9 Updated by Lynn Garren over 3 years ago

  • Status changed from Feedback to Resolved

#10 Updated by Katherine Lato over 3 years ago

  • Assignee set to Saba Sehrish

Since it's resolved, changing it from unassigned to Saba. (So it doesn't show up on unassigned lists.)

#11 Updated by Katherine Lato over 3 years ago

  • Status changed from Resolved to Closed

On 7/18/16, 3:47 PM, "Gianluca Petrillo" <> wrote:

You can also close it.



Also available in: Atom PDF