Project

General

Profile

Bug #10623

Crash reading multiple files

Added by Herbert Greenlee almost 4 years ago. Updated over 3 years ago.

Status:
Closed
Priority:
Urgent
Assignee:
Category:
I/O
Target version:
Start date:
10/23/2015
Due date:
% Done:

100%

Estimated time:
Spent time:
Occurs In:
Scope:
Internal
Experiment:
MicroBooNE
SSI Package:
art
Duration:

Description

We have a crash when reading multiple files. The fcl file is extremely simple, just RootInput and RootOutput (for merging files). Find the complete fcl file below.

I talked to Marc and Kyle, and they diagnosed the problem as being due to our files having a subrun data product (POTSummary) that was never read into memory. They suggested a workaround of adding a module to read this subrun data product (didn't test yet). I did test another workaround of dropping the POTSummary data product, which worked.

#include "services_microboone.fcl" 

process_name: Copy

services:
{
  scheduler:    { defaultExceptions: false }    # Make all uncaught exceptions fatal.
  message:      @local::standard_warning
  FileCatalogMetadata:  @local::art_file_catalog_data
}

source:
{
  module_type: RootInput
  maxEvents:  10        # Number of events to create
}

physics:
{
 #define the output stream, there could be more than one if using filters 
 stream1:  [ out1 ]
 #end_paths is a keyword and contains the paths that do not modify the art::Event, 
 #ie analyzers and output streams.  these all run simultaneously
 end_paths:     [ stream1 ]  
}

outputs:
{

 out1:
 {

   module_type: RootOutput
   fileName:    "%ifb_%tc_merged.root" 
   dataTier:    "raw" 
   compressionLevel: 3

   FCMDPlugins: [

     { plugin_type:   "FileCatalogMetadataRawDigit" 
       RawDigitLabel: "digitcopy" 
     }

    ]

 }

}

Associated revisions

Revision 7c147db5
Added by Christopher Green almost 4 years ago

Merge tag 'v4_17_00' into develop

Release with fix for issue #10623 -- crash reading multiple files.

History

#1 Updated by Kyle Knoepfel almost 4 years ago

  • Description updated (diff)

#2 Updated by Christopher Green almost 4 years ago

  • Category set to I/O
  • Status changed from New to Assigned
  • Assignee set to Kyle Knoepfel

This report is being investigated.

#3 Updated by Marc Paterno almost 4 years ago

Email message from Herb follows:

Hello Kyle,

I now know that art issue 10623 is definitely related to my post on
art-users:

https://listserv.fnal.gov/scripts/wa.exe?A2=ind1510&L=ART-USERS&F=&S=&X=7EB9A15E9A40691DB4&Y=greenlee%40fnal.gov&P=11160

The problem is that somehow, these subrun data products were invisible.
Art couldn't read them in any circumstance. But I could see them in
TBrowser, and they caused a crash when reading accross multiple input
files. Unfortunately, I don't have these specific files any more, and
newly generated ones seem to be OK.

In newly generated files, I can both read the subrun data products, and
reading accross multiple files does not cause a crash, even if I neither
read nor drop the subrun data products.

The crashing files were generated by an immature state of the code, that
(I think) filled these subrun data products with a vector of
default-constructed versions containing all zeroes.

So, my immediate problem is solved. That doesn't mean art is off the hook
regarding possible bugs, since art didn't generate any diagnostics
regarding why these data products were invisible or unreadable.

Herb

On Sun, 25 Oct 2015, Kyle Knoepfel wrote:

Hi Herb,

Can you please provide two input files in which you observe this behavior? Please also let us
know which version of uboonecode/larsoft was used.

Thanks,
Kyle.

#4 Updated by Christopher Green almost 4 years ago

  • Status changed from Assigned to Feedback
  • Priority changed from Normal to Urgent

Herb, please confirm exactly which tagged versions of uboonecode can be used both to create and to read the files. If any non-version-controlled code was used, please provide same.

#5 Updated by Herbert Greenlee almost 4 years ago

Uboonecode v04_26_03 using only tagged code.

#6 Updated by Kyle Knoepfel almost 4 years ago

  • Status changed from Feedback to Resolved
  • % Done changed from 0 to 100
  • SSI Package art added
  • SSI Package deleted ()

Thank you, Herb. We confirm the problem in the case you encountered. The segmentation violation was encountered because the presence bits were set to false in the file, and art was not checking the presence information appropriately. An attempt was then made to resolve the product when it should not have been.

Note that a present bit set to false indicates an instance where a product was declared via 'produces' but not placed onto the event, subrun, or run. This checking is now done for event-level products; but it is currently not possible to do so for subrun and run products.

Implemented with art:d72f294. The fix will be included in 1.17.03.

#7 Updated by Kyle Knoepfel almost 4 years ago

  • Target version set to 1.17.03

#8 Updated by Kyle Knoepfel over 3 years ago

  • Status changed from Resolved to Closed


Also available in: Atom PDF