Project

General

Profile

Bug #9193

Segmentation violation when multiple files with run/subrun products are merged

Added by Kyle Knoepfel over 4 years ago. Updated over 4 years ago.

Status:
Closed
Priority:
Urgent
Assignee:
Category:
I/O
Target version:
Start date:
06/12/2015
Due date:
% Done:

100%

Estimated time:
Spent time:
Occurs In:
Scope:
Internal
Experiment:
-
SSI Package:
art
Duration:

Description

We have been able to verify MicroBooNE's observation of a segmentation violation within art. This can be reproduced using the following commands (on woof):

cd ~knoepfel/scratch/build-art/
art -c test1_t.fcl -n 10
art -c test2_t.fcl -n 10
art -c cfg_stripped.fcl -n 20 -s test1.root test2.root

This is of urgent priority.


Related issues

Related to LArSoft - Bug #9108: Processing multiple filesClosed06/11/2015

Related to cet-is - Support #9642: Odd segfault when running nova concat_files jobClosed07/16/2015

Associated revisions

Revision 7858c2ed
Added by Christopher Green over 4 years ago

Merge tag 'v1_14_03'

Tag for release with fix for issue #9193.

History

#1 Updated by Paul Russo over 4 years ago

Ok, I think I've found out why we have double the number of selected output products in the RootOutputFile selected items list as we should have.

I have inserted clears for the OutputModule keptProducts_ selected items lists to make sure we completely rebuild the list when a reselection gets triggered instead of adding on to the old list.

Note that we were already doing this clear when rebuilding the RootOutputFile's copy, but we were not doing it for the OutputModule's copy.

You can pickup the code change from woof:/home/russo/work/art_om_kept_products_fix in file art/Framework/Core/OutputModule.cc or apply this patch:

diff --git art/Framework/Core/OutputModule.cc art/Framework/Core/OutputModule.cc
index 891deb6..563e3ef 100644
--- art/Framework/Core/OutputModule.cc
+++ art/Framework/Core/OutputModule.cc
@@ -74,6 +74,9 @@ selectProducts(FileBlock const& fb)
   preSelectProducts(fb);
   groupSelector_.initialize(groupSelectorRules_,
                             ProductMetaData::instance().productList());
+  for (auto& val : keptProducts_) {
+    val.clear();
+  }
   // TODO: See if we can collapse keptProducts_ and groupSelector_ into a
   // single object. See the notes in the header for GroupSelector
   // for more information.
@@ -129,6 +132,9 @@ doBeginJob()
   //selectProducts();
   groupSelector_.initialize(groupSelectorRules_,
                             ProductMetaData::instance().productList());
+  for (auto& val : keptProducts_) {
+    val.clear();
+  }
   for (auto const& val : ProductMetaData::instance().productList()) {
     BranchDescription const& bd = val.second;
     if (bd.transient()) {

#2 Updated by Paul Russo over 4 years ago

I don't think this will fix the crash at the end of the job due to accessing deleted BranchDescription data, but this problem does need to be fixed.

#3 Updated by Kyle Knoepfel over 4 years ago

  • Description updated (diff)

#4 Updated by Kyle Knoepfel over 4 years ago

  • Description updated (diff)

#5 Updated by Kyle Knoepfel over 4 years ago

  • Subject changed from Segmentation violation with multiple files with run/subrun products are merged to Segmentation violation when multiple files with run/subrun products are merged

#6 Updated by Christopher Green over 4 years ago

  • Related to Bug #9108: Processing multiple files added

#7 Updated by Christopher Green over 4 years ago

  • Category set to I/O
  • Status changed from New to Assigned
  • Assignee set to Kyle Knoepfel
  • Target version set to 1.14.03

#8 Updated by Kyle Knoepfel over 4 years ago

  • Status changed from Assigned to Resolved
  • % Done changed from 0 to 100

The fix for this bug has been implemented. The short summary:

  • The original segmentation fault resulted from accessing an invalid memory location. The stale memory location was also present in art 1.13, but it was never accessed.
  • In fixing the the invalid memory access, a design weakness of the product retrieval system was exposed--to wit, the BranchDescription class had a presence flag, the intent of which was to indicate if a product was retrievable from a ROOT input file or was being produced in the current process. Although a suboptimal design choice, the existence of this presence flag in BranchDescription had an unambiguous meaning in art 1.13. Once secondary input file reading was introduced with art 1.14, the BranchDescription presence flag was conceptually ill-defined.
  • The above realization necessitated a modest redesign of the product retrieval system. With this fix, the product presence is now determined on a per-file basis, and the invalid memory access is no longer possible as the objects populating the necessary registry persist for the duration of the process.

An overwhelming majority of the spent time on this issue is analysis work, determining the sources of and possible resolutions for the problems mentioned above.

Implemented with commit: art:862d737631c3c6e94ba0cee1a27745013809646c.

#9 Updated by Christopher Green over 4 years ago

  • Status changed from Resolved to Closed

#10 Updated by Kyle Knoepfel over 4 years ago

  • Related to Support #9642: Odd segfault when running nova concat_files job added


Also available in: Atom PDF