Project

General

Profile

Bug #19809

seg fault caused by dropping sumdata::RunData

Added by David Caratelli over 1 year ago. Updated over 1 year ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
Infrastructure
Target version:
Start date:
04/25/2018
Due date:
% Done:

100%

Estimated time:
Spent time:
Occurs In:
Scope:
Internal
Experiment:
MicroBooNE
SSI Package:
art
Duration:

Description

I report below the full message I sent to Kyle Knoepfel, who suggested this bug may be related to issue 19465

Dear Kyle,

I have run into an issue in my larsoft code development which could be due to an art event. I report below the issue and conclusions I have reached with the help of Herb Greenlee.

Herb suggested I reach out to your for support. Any help you may provide would be helpful. Thank you.

When moving my reconstruction to uboonecode version v06_74_00, all art Producer modules I use seem to seg-fault at the very end of their running, at the event.put() function call which stores newly produced objects in the event record. The full seg fault trace is at the end of this message.

To debug this I have produced a trivial producer module:
/uboone/app/users/davidc/v06_74_01/srcs/uboonecode/uboone/LArCVImageMaker/FakeVtx_module.cc

And try running it on two files:

-A file that is produced as output from a previous analyzer-only job I ran:
/pnfs/uboone/scratch/users/davidc/v06_74_00/ccincl_supera/0422/davidc_prod_reco_optfilter_bnb_ccinclusive_v13_mcc8_nonempty/7042693_1/LArSoft.root

- A file from production (definition: prod_reco_optfilter_bnb_ccinclusive_v13_mcc8)
/pnfs/uboone/data/uboone/reconstructed/prod_v06_26_01_13/MCC8.9-CCinclusive/bnb/00/00/61/57/PhysicsRun-2016_5_3_9_10_16-0006157-00049_20160503T183153_bnb_20160503T205449_merged_20171205T015536_reco1_20171205T020349_reco2_20171206T025146_merged_20180406T144350_ubxsec.root

The former fails, while the second succeeds. This is puzzling to me since the former was produced by running an analyzer-only job, which to my knowledge do not modify the event record. The error message (a seg fault trace from gdb) is reported at the end of this message).

These failures did not show up as early back as v06_67_01 (I did not attempt anything in between 06_67_01 and 06_74_00).

The failed files had a call to drop the "sumdata::RunData" product in the event record. If that product is NOT removed, the error disappears. Herb suggests this may be related to a bug in art version art 2.10.x.

A tar ball of my localProducts area is below:
/uboone/app/users/davidc/v06_74_01/local.tar

Thank you,
David C.

Program received signal SIGSEGV, Segmentation fault.
art::Principal::getForOutput(art::ProductID, bool) const () at /scratch/workspace/art-release-build/SLF6/prof/build/art/v2_10_03/src/art/Framework/Principal/Principal.cc:434
434    /scratch/workspace/art-release-build/SLF6/prof/build/art/v2_10_03/src/art/Framework/Principal/Principal.cc: No such file or directory.
    in /scratch/workspace/art-release-build/SLF6/prof/build/art/v2_10_03/src/art/Framework/Principal/Principal.cc
Missing separate debuginfos, use: debuginfo-install cyrus-sasl-lib-2.1.23-15.el6_6.2.x86_64 expat-2.0.1-13.el6_8.x86_64 freetype-2.3.11-15.el6_6.1.x86_64 glibc-2.12-1.209.el6_9.2.x86_64 keyutils-libs-1.4-5.el6.x86_64 krb5-libs-1.10.3-42z1.el6_7.x86_64 libICE-1.0.6-1.el6.x86_64 libSM-1.2.1-2.el6.x86_64 libX11-1.6.4-3.el6.x86_64 libXau-1.0.6-4.el6.x86_64 libXdamage-1.1.3-4.el6.x86_64 libXext-1.3.2-2.1.el6.x86_64 libXfixes-5.0.3-1.el6.x86_64 libXxf86vm-1.1.3-2.1.el6.x86_64 libcom_err-1.41.12-22.el6.x86_64 libcurl-7.19.7-53.el6_9.x86_64 libdrm-2.4.65-2.el6.x86_64 libidn-1.18-2.el6.x86_64 libselinux-2.0.94-5.8.el6.x86_64 libssh2-1.4.2-2.el6_7.1.x86_64 libuuid-2.17.2-12.18.el6.x86_64 libxcb-1.12-4.el6.x86_64 mesa-dri-drivers-11.0.7-4.el6.x86_64 mesa-libGL-11.0.7-4.el6.x86_64 mesa-libGLU-11.0.7-4.el6.x86_64 ncurses-libs-5.7-4.20090207.el6.x86_64 nspr-4.13.1-1.el6.x86_64 nss-3.28.4-4.el6_9.x86_64 nss-softokn-freebl-3.14.3-23.el6_7.x86_64 nss-util-3.28.4-1.el6_9.x86_64 openldap-2.4.40-6.el6_7.x86_64 openssl-1.0.1e-48.sl6_8.4.x86_64 pcre-7.8-7.el6.x86_64 xz-libs-4.999.9-0.5.beta.20091007git.el6.x86_64 zlib-1.2.3-29.el6.x86_64
(gdb) where
#0  art::Principal::getForOutput(art::ProductID, bool) const () at /scratch/workspace/art-release-build/SLF6/prof/build/art/v2_10_03/src/art/Framework/Principal/Principal.cc:434
#1  0x00007ffff6f73f55 in void art::RootOutputFile::fillBranches<(art::BranchType)0>(art::Principal const&, std::vector<art::ProductProvenance, std::allocator<art::ProductProvenance> >*) () at /scratch/workspace/art-release-build/SLF6/prof/build/art/v2_10_03/src/art/Framework/IO/Root/RootOutputFile.cc:855
#2  0x00007ffff6f6bca4 in art::RootOutputFile::writeOne(art::EventPrincipal const&) () at /scratch/workspace/art-release-build/SLF6/prof/build/art/v2_10_03/src/art/Framework/IO/Root/RootOutputFile.cc:549
#3  0x00007fffe433e42e in art::RootOutput::write(art::EventPrincipal&) () at /scratch/workspace/art-release-build/SLF6/prof/build/art/v2_10_03/src/art/Framework/IO/Root/RootOutput_module.cc:313
#4  0x00007ffff6c8fecb in art::OutputModule::doWriteEvent(art::EventPrincipal&) () at /scratch/workspace/art-release-build/SLF6/prof/build/art/v2_10_03/src/art/Framework/Core/OutputModule.cc:178
#5  0x00007ffff6c5c1a1 in art::EndPathExecutor::writeEvent(art::EventPrincipal&) () at /scratch/workspace/art-release-build/SLF6/prof/build/art/v2_10_03/src/art/Framework/Core/EndPathExecutor.cc:133
#6  0x00007ffff75f4b38 in art::EventProcessor::writeEvent() () at /scratch/workspace/art-release-build/SLF6/prof/build/art/v2_10_03/src/art/Framework/EventProcessor/EventProcessor.cc:905
#7  0x00007ffff75f726d in void art::EventProcessor::process<(art::Level)4>() () at /scratch/workspace/art-release-build/SLF6/prof/build/art/v2_10_03/src/art/Framework/EventProcessor/EventProcessor.cc:433
#8  0x00007ffff75f74df in art::EventProcessor::runToCompletion() () at /scratch/workspace/art-release-build/SLF6/prof/build/art/v2_10_03/src/art/Framework/EventProcessor/EventProcessor.cc:452
#9  0x00007ffff7d9bb62 in art::run_art_common_(fhicl::ParameterSet const&, art::detail::DebugOutput) () at /scratch/workspace/art-release-build/SLF6/prof/build/art/v2_10_03/src/art/Framework/Art/run_art.cc:256
#10 0x00007ffff7d9db65 in art::run_art(int, char**, boost::program_options::options_description&, cet::filepath_maker&, std::vector<std::unique_ptr<art::OptionsHandler, std::default_delete<art::OptionsHandler> >, std::allocator<std::unique_ptr<art::OptionsHandler, std::default_delete<art::OptionsHandler> > > >&&, art::detail::DebugOutput&&) () at /scratch/workspace/art-release-build/SLF6/prof/build/art/v2_10_03/src/art/Framework/Art/run_art.cc:129
#11 0x00007ffff7d990ba in artapp(int, char**) () at /scratch/workspace/art-release-build/SLF6/prof/build/art/v2_10_03/build-Linux64bit+2.6-2.12-e15-prof/art/Framework/Art/artapp.cc:54
#12 0x000000000040158c in main () at /scratch/workspace/art-release-build/SLF6/prof/build/art/v2_10_03/build-Linux64bit+2.6-2.12-e15-prof/art/Framework/Art/lar.cc:8
(gdb)

Related issues

Related to art - Bug #19465: Branches created for dropped productsClosed03/22/2018

History

#1 Updated by Kyle Knoepfel over 1 year ago

  • Description updated (diff)
  • Status changed from New to Assigned
  • Assignee set to Kyle Knoepfel

#2 Updated by Kyle Knoepfel over 1 year ago

  • Related to Bug #19465: Branches created for dropped products added

#3 Updated by Kyle Knoepfel over 1 year ago

  • Category set to Infrastructure
  • Status changed from Assigned to Resolved
  • Target version set to 2.11.02
  • % Done changed from 0 to 100
  • Occurs In 2.11.01 added
  • SSI Package art added

This issue has been fixed and implemented with commit art:fabc493.

Various mistakes in stored art metadata have persisted over the last several versions. Unfortunately, this means that some output files are larger than they need to be, and it can also mean that one occasionally encounters this kind of error. We apologize for this inconvenience, and we will produce art 2.11.02 that incorporates the bug fix.

#4 Updated by Kyle Knoepfel over 1 year ago

  • Status changed from Resolved to Closed


Also available in: Atom PDF