Project

General

Profile

Bug #23319

Problems reading ProtoDUNE-SP raw data file with artdaq_core v3_05_02

Added by Thomas Junk 12 months ago. Updated 12 months ago.

Status:
Closed
Priority:
Urgent
Category:
Known Issues
Target version:
Start date:
09/24/2019
Due date:
% Done:

100%

Estimated time:
Experiment:
DUNE
Co-Assignees:
Duration:

Description

Here is a raw data file from ProtoDUNE-SP

root://fndca1.fnal.gov:1094/pnfs/fnal.gov/usr/dune/tape_backed/dunepro/protodune-sp/raw/2019/detector/test/None/00/00/97/01/np04_raw_run009701_0004_dl1.root

Unpacking it yields many of these messages (which look harmless):

Upgrading RawFragmentHeaderV1 (const)
Upgrading RawFragmentHeaderV1 (const)
Upgrading RawFragmentHeaderV1 (const)
Upgrading RawFragmentHeaderV1 (const)
Upgrading RawFragmentHeaderV1 (const)
Upgrading RawFragmentHeaderV1 (const)
Upgrading RawFragmentHeaderV1 (const)

but it stops on an exception in the first event. Most of this is just art's cacheing of the exception until end job, but
the "No metadata has been stored" appears to be the useful line.

%MSG-s ArtException: PostEndJob 24-Sep-2019 13:45:15 CDT ModuleEndJob
cet::exception caught in art
---- OtherArt BEGIN
---- EventProcessorFailure BEGIN
EventProcessor: an exception occurred during current event processing
---- ScheduleExecutionFailure BEGIN
Path: ProcessingStopped.
---- OtherArt BEGIN
---- InvalidRequest BEGIN
No metadata has been stored in this Fragment.cet::exception going through module
---- InvalidRequest END
---- OtherArt END
Exception going through path produce
---- ScheduleExecutionFailure END
---- EventProcessorFailure END
---- OtherArt END
%MSG

In the debugger, it is trying to decode the SSP container fragments with the ContainerPHOTON instance.

I put a breakpoint on that exception throw, and one time it was executing this part of our code. { ...

artdaq::ContainerFragment contf(cont);
for (size_t ii = 0; ii < contf.block_count(); ++ii)

though on a separate run, it stopped in a different method trying to access metadata for the FELIX container fragments
initializing metadata:

FelixFragmentBase(const artdaq::Fragment& fragment)
: meta_(*(fragment.metadata&lt;Metadata&gt;())),
}
extractEvent.fcl (143 Bytes) extractEvent.fcl FHiCL Configuration used to extract event Eric Flumerfelt, 09/25/2019 11:14 AM
np04_raw_run009701_0004_dl1.root.copied.extracted (71.8 MB) np04_raw_run009701_0004_dl1.root.copied.extracted Eric Flumerfelt, 09/25/2019 11:24 AM
valgrind.out (1.81 MB) valgrind.out Pengfei Ding, 09/26/2019 12:11 AM
valgrind2.out (1.79 MB) valgrind2.out Pengfei Ding, 09/26/2019 12:59 AM
Screen Shot 2019-09-26 at 1.12.33 PM.png (57.5 KB) Screen Shot 2019-09-26 at 1.12.33 PM.png Pengfei Ding, 09/26/2019 01:16 PM
Screen Shot 2019-09-26 at 1.11.54 PM.png (325 KB) Screen Shot 2019-09-26 at 1.11.54 PM.png Pengfei Ding, 09/26/2019 01:16 PM

Related issues

Related to artdaq - Support #23328: EventDump verbose mode unnecessary copiesClosed09/25/2019

Related to artdaq - Bug #23345: Incorrect reporting of data size from Fragments in Containers when upgradedClosed09/27/2019

Related to artdaq - Bug #23348: Memory leaks while reading artdaq Fragments from data filesClosed09/27/2019

Associated revisions

Revision 0e507ad2 (diff)
Added by Eric Flumerfelt 12 months ago

Do not attempt in-place upgrade of const Fragment. Add headerSizeBytes()
method to Fragment for use in getting the in-memory header size (in case
it differs from the current version of RawFragmentHeader). Update
ContainerFragment to index based on actual Fragment header sizes. Update
version() method to always return in-memory version of Fragment header.
Update Fragment_t tests.

Resolves #23319

Revision 8904fa9a (diff)
Added by Eric Flumerfelt 12 months ago

If the word count is changed, be sure to update the header. Part of Issue #23319

History

#1 Updated by Thomas Junk 12 months ago

A couple more runs seem to find the exception more often in the FELIX unpacker.

#2 Updated by Lynn Garren 12 months ago

Tom, what command would we need to use to replicate the problem?

#3 Updated by Thomas Junk 12 months ago

One would have to build dunetpc from the development head (we were trying to build a release and held back
due to this), and then run

lar -c RunRawDecoder.fcl root://fndca1.fnal.gov:1094/pnfs/fnal.gov/usr/dune/tape_backed/dunepro/protodune-sp/raw/2019/detector/test/None/00/00/97/01/np04_raw_run009701_0004_dl1.root

David may be unwinding product_deps so that the head of develop isn't broken. I have a build on my desktop
that exhibits the problem.

#4 Updated by Lynn Garren 12 months ago

Once I have a grid proxy, I can replicate the problem on cluck. Here is the complete output:

[garren@cluck Testing]$ lar -c RunRawDecoder.fcl root://fndca1.fnal.gov:1094/pnfs/fnal.gov/usr/dune/tape_backed/dunepro/protodune-sp/raw/2019/detector/test/None/00/00/97/01/np04_raw_run009701_0004_dl1.root         
%MSG-i MF_INIT_OK:  Early 24-Sep-2019 15:53:07 CDT JobSetup                                                
Messagelogger initialization complete.                                                                     
%MSG                                                                                                       
Info in <TGeoManager::Import>: Reading geometry from file: /scratch/garren/larsoft/dune/build_slf7.x86_64/dunetpc/gdml/protodune_v7.gdml                                                                              
Info in <TGeoManager::TGeoManager>: Geometry GDMLImport, Geometry imported from GDML created               
Info in <TGeoManager::SetTopVolume>: Top volume is volWorld. Master volume is volWorld                     
Info in <TGeoNavigator::BuildCache>: --- Maximum geometry depth set to 100                                 
Info in <TGeoManager::CheckGeometry>: Fixing runtime shapes...                                             
Info in <TGeoManager::CheckGeometry>: ...Nothing to fix                                                    
Info in <TGeoManager::CloseGeometry>: Counting nodes...                                                    
Info in <TGeoManager::Voxelize>: Voxelizing...                                                             
Info in <TGeoManager::CloseGeometry>: Building cache...                                                    
Info in <TGeoManager::CountLevels>: max level = 5, max placements = 1148                                   
Info in <TGeoManager::CloseGeometry>: 36176 nodes/ 5435 volume UID's in Geometry imported from GDML        
Info in <TGeoManager::CloseGeometry>: ----------------modeler ready----------------                        
%MSG-i GeometryCore:  Early 24-Sep-2019 15:53:12 CDT JobSetup                                              
New detector geometry loaded from                                                                          
        /scratch/garren/larsoft/dune/build_slf7.x86_64/dunetpc/gdml/protodune_v7.gdml                      
        /scratch/garren/larsoft/dune/build_slf7.x86_64/dunetpc/gdml/protodune_v7_nowires.gdml              

%MSG
%MSG-i GeometryCore:  Early 24-Sep-2019 15:53:12 CDT JobSetup
Sorting volumes...                                           
%MSG                                                         
%MSG-i DuneApaChannelMapAlg:  Early 24-Sep-2019 15:53:12 CDT JobSetup
Initializing channel map...                                          
%MSG                                                                 
Cryostat 0:                                                          
  15360 total channels                                               
  6 APAs                                                             
  For all identical APA:                                             
    Number of channels per APA = 2560                                
    U channels per APA = 800                                         
    V channels per APA = 800                                         
    Z channels per APA = 960                                         
    Pitch in U Plane = 0.4669                                        
    Pitch in V Plane = 0.4669                                        
    Pitch in Z Plane = 0.4792                                        
%MSG-i SimpleChannelStatusService:  Early 24-Sep-2019 15:53:12 CDT JobSetup
Loaded from configuration:                                                 
  - 0 bad channels                                                         
  - 0 noisy channels                                                       
  - largest channel ID: 15359, largest present: 15359                      
%MSG                                                                       
%MSG-i setupProvider<DetectorPropertiesStandard>:  Early  24-Sep-2019 15:53:12 CDT JobSetup
Asked to ignore 1 keys: 'InheritNumberTimeSamples'                                         
%MSG                                                                                       
PDSP Channel Map: Building RCE TPC wiremap from file protoDUNETPCChannelMap_RCE_v4.txt     
PDSP Channel Map: Building FELIX TPC wiremap from file protoDUNETPCChannelMap_RCE_v4.txt   
PDSP Channel Map: Building SSP channel map from file protoDUNESSPChannelMap_v1.txt         
%MSG-i FastCloning:  RootOutput:out1@Construction 24-Sep-2019 15:53:13 CDT  ModuleConstruction
Initial fast cloning configuration (from default): true                                       
%MSG                                                                                          
Parameters from the fcl file                                                                  
m1: 10                                                                                        
m2: 10                                                                                        
i1: 40                                                                                        
i2: 1200                                                                                      
SPESize: 25                                                                                   

24-Sep-2019 15:53:13 CDT  Initiating request to open input file "root://fndca1.fnal.gov:1094/pnfs/fnal.gov/usr/dune/tape_backed/dunepro/protodune-sp/raw/2019/detector/test/None/00/00/97/01/np04_raw_run009701_0004_dl1.root"                                                                                                   
24-Sep-2019 15:53:15 CDT  Opened input file "root://fndca1.fnal.gov:1094/pnfs/fnal.gov/usr/dune/tape_backed/dunepro/protodune-sp/raw/2019/detector/test/None/00/00/97/01/np04_raw_run009701_0004_dl1.root"            
%MSG-w Geometry:  BeginRun 24-Sep-2019 15:53:15 CDT run: 9701                                              
cannot find sumdata::RunData object to grab detector name                                                  
this is expected if generating MC files                                                                    
using default geometry from configuration file                                                             

%MSG
Begin processing the 1st record. run: 9701 subRun: 1 event: 2654 at 24-Sep-2019 15:53:15 CDT
Upgrading RawFragmentHeaderV1 (const)                                                       
Upgrading RawFragmentHeaderV1 (const)                                                       
Upgrading RawFragmentHeaderV1 (const)                                                       
Upgrading RawFragmentHeaderV1 (const)                                                       
Upgrading RawFragmentHeaderV1 (const)                                                       
Upgrading RawFragmentHeaderV1 (const)                                                       
Upgrading RawFragmentHeaderV1 (const)                                                       
Upgrading RawFragmentHeaderV1 (const)                                                       
Upgrading RawFragmentHeaderV1 (const)                                                       
Upgrading RawFragmentHeaderV1 (const)                                                       
Upgrading RawFragmentHeaderV1 (const)                                                       
Upgrading RawFragmentHeaderV1 (const)                                                       
Upgrading RawFragmentHeaderV1 (const)                                                       
Upgrading RawFragmentHeaderV1 (const)                                                       
Upgrading RawFragmentHeaderV1 (const)                                                       
Upgrading RawFragmentHeaderV1 (const)                                                       
Upgrading RawFragmentHeaderV1 (const)                                                       
Upgrading RawFragmentHeaderV1 (const)                                                       
Upgrading RawFragmentHeaderV1 (const)                                                       
Upgrading RawFragmentHeaderV1 (const)                                                       
Upgrading RawFragmentHeaderV1 (const)                                                       
Upgrading RawFragmentHeaderV1 (const)                                                       
Upgrading RawFragmentHeaderV1 (const)                                                       
Upgrading RawFragmentHeaderV1 (const)                                                       
Upgrading RawFragmentHeaderV1 (const)                                                       
Upgrading RawFragmentHeaderV1 (const)                                                       
Upgrading RawFragmentHeaderV1 (const)                                                       
Upgrading RawFragmentHeaderV1 (const)                                                       
Upgrading RawFragmentHeaderV1 (const)                                                       
Upgrading RawFragmentHeaderV1 (const)                                                       
Upgrading RawFragmentHeaderV1 (const)                                                       
Upgrading RawFragmentHeaderV1 (const)                                                       
Upgrading RawFragmentHeaderV1 (const)                                                       
Upgrading RawFragmentHeaderV1 (const)                                                       
Upgrading RawFragmentHeaderV1 (const)                                                       
Upgrading RawFragmentHeaderV1 (const)                                                       
Upgrading RawFragmentHeaderV1 (const)                                                       
Upgrading RawFragmentHeaderV1 (const)                                                       
Upgrading RawFragmentHeaderV1 (const)                                                       
Upgrading RawFragmentHeaderV1 (const)                                                       
Upgrading RawFragmentHeaderV1 (const)                                                       
Upgrading RawFragmentHeaderV1 (const)                                                       
Upgrading RawFragmentHeaderV1 (const)                                                       
Upgrading RawFragmentHeaderV1 (const)                                                       
Upgrading RawFragmentHeaderV1 (const)                                                       
Upgrading RawFragmentHeaderV1 (const)                                                       
Upgrading RawFragmentHeaderV1 (const)                                                       
Upgrading RawFragmentHeaderV1 (const)                                                       
Upgrading RawFragmentHeaderV1 (const)                                                       
Upgrading RawFragmentHeaderV1 (const)                                                       
Upgrading RawFragmentHeaderV1 (const)                                                       
Upgrading RawFragmentHeaderV1 (const)                                                       
Upgrading RawFragmentHeaderV1 (const)                                                       
Upgrading RawFragmentHeaderV1 (const)                                                       
Upgrading RawFragmentHeaderV1 (const)                                                       
Upgrading RawFragmentHeaderV1 (const)                                                       
Upgrading RawFragmentHeaderV1 (const)                                                       
Upgrading RawFragmentHeaderV1 (const)                                                       
Upgrading RawFragmentHeaderV1 (const)                                                       
Upgrading RawFragmentHeaderV1 (const)                                                       
Upgrading RawFragmentHeaderV1 (const)                                                       
Upgrading RawFragmentHeaderV1 (const)                                                       
Upgrading RawFragmentHeaderV1 (const)                                                       
Upgrading RawFragmentHeaderV1 (const)                                                       
Upgrading RawFragmentHeaderV1 (const)                                                       
Upgrading RawFragmentHeaderV1 (const)                                                       
Upgrading RawFragmentHeaderV1 (const)                                                       
Upgrading RawFragmentHeaderV1 (const)                                                       
Upgrading RawFragmentHeaderV1 (const)                                                       
Upgrading RawFragmentHeaderV1 (const)                                                       
Upgrading RawFragmentHeaderV1 (const)                                                       
Upgrading RawFragmentHeaderV1 (const)                                                       
Upgrading RawFragmentHeaderV1 (const)                                                       
Upgrading RawFragmentHeaderV1 (const)                                                       
Upgrading RawFragmentHeaderV1 (const)                                                       
Upgrading RawFragmentHeaderV1 (const)                                                       
Upgrading RawFragmentHeaderV1 (const)                                                       
Upgrading RawFragmentHeaderV1 (const)                                                       
Upgrading RawFragmentHeaderV1 (const)                                                       
Upgrading RawFragmentHeaderV1 (const)                                                       
Upgrading RawFragmentHeaderV1 (const)                                                       
Upgrading RawFragmentHeaderV1 (const)                                                       
Upgrading RawFragmentHeaderV1 (const)                                                       
Upgrading RawFragmentHeaderV1 (const)                                                       
Upgrading RawFragmentHeaderV1 (const)                                                       
Upgrading RawFragmentHeaderV1 (const)                                                       
Upgrading RawFragmentHeaderV1 (const)                                                       
Upgrading RawFragmentHeaderV1 (const)                                                       
Upgrading RawFragmentHeaderV1 (const)                                                       
Upgrading RawFragmentHeaderV1 (const)                                                       
Upgrading RawFragmentHeaderV1 (const)                                                       
Upgrading RawFragmentHeaderV1 (const)                                                       
Upgrading RawFragmentHeaderV1 (const)                                                       
Upgrading RawFragmentHeaderV1 (const)                                                       
Upgrading RawFragmentHeaderV1 (const)                                                       
Upgrading RawFragmentHeaderV1 (const)                                                       
Upgrading RawFragmentHeaderV1 (const)                                                       
Upgrading RawFragmentHeaderV1 (const)                                                       
Upgrading RawFragmentHeaderV1 (const)                                                       
Upgrading RawFragmentHeaderV1 (const)                                                       
Upgrading RawFragmentHeaderV1 (const)                                                       
Upgrading RawFragmentHeaderV1 (const)                                                       
Upgrading RawFragmentHeaderV1 (const)                                                       
Upgrading RawFragmentHeaderV1 (const)                                                       
Upgrading RawFragmentHeaderV1 (const)                                                       
Upgrading RawFragmentHeaderV1 (const)                                                       
Upgrading RawFragmentHeaderV1 (const)                                                       
Upgrading RawFragmentHeaderV1 (const)                                                       
Upgrading RawFragmentHeaderV1 (const)                                                       
Upgrading RawFragmentHeaderV1 (const)                                                       
Upgrading RawFragmentHeaderV1 (const)                                                       
Upgrading RawFragmentHeaderV1 (const)                                                       
Upgrading RawFragmentHeaderV1 (const)                                                       
Upgrading RawFragmentHeaderV1 (const)                                                       
Upgrading RawFragmentHeaderV1 (const)                                                       
Upgrading RawFragmentHeaderV1 (const)                                                       
Upgrading RawFragmentHeaderV1 (const)                                                       
Upgrading RawFragmentHeaderV1 (const)                                                       
Upgrading RawFragmentHeaderV1 (const)                                                       
Upgrading RawFragmentHeaderV1 (const)                                                       
Upgrading RawFragmentHeaderV1 (const)                                                       
Upgrading RawFragmentHeaderV1 (const)                                                       
Upgrading RawFragmentHeaderV1 (const)                                                       
Upgrading RawFragmentHeaderV1 (const)                                                       
Upgrading RawFragmentHeaderV1 (const)                                                       
Upgrading RawFragmentHeaderV1 (const)                                                       
Upgrading RawFragmentHeaderV1 (const)                                                       
Upgrading RawFragmentHeaderV1 (const)                                                       
Upgrading RawFragmentHeaderV1 (const)                                                       
Upgrading RawFragmentHeaderV1 (const)                                                       
Upgrading RawFragmentHeaderV1 (const)                                                       
Upgrading RawFragmentHeaderV1 (const)                                                       
Upgrading RawFragmentHeaderV1 (const)                                                       
Upgrading RawFragmentHeaderV1 (const)                                                       
Upgrading RawFragmentHeaderV1 (const)                                                       
Upgrading RawFragmentHeaderV1 (const)
Upgrading RawFragmentHeaderV1 (const)
Upgrading RawFragmentHeaderV1 (const)
Upgrading RawFragmentHeaderV1 (const)
24-Sep-2019 15:53:22 CDT  Opened output file with pattern "%ifb_decode.root" 
24-Sep-2019 15:53:29 CDT  Closed input file "root://fndca1.fnal.gov:1094/pnfs/fnal.gov/usr/dune/tape_backed/dunepro/protodune-sp/raw/2019/detector/test/None/00/00/97/01/np04_raw_run009701_0004_dl1.root" 
Malformed TimeTracker database.  The TimeEvent table is empty, but
the TimeModule table is not.  This can happen if an exception has
been thrown from a module while processing the first event.  Any
saved database file is suspect and should not be used.

TrigReport ---------- Event  Summary ------------
TrigReport Events total = 1 passed = 0 failed = 1

TrigReport ------ Modules in End-Path: end_path ------------
TrigReport  Trig Bit#        Run    Success      Error Name
TrigReport     0    0          0          0          0 out1

TimeReport ---------- Time  Summary ---[sec]----
TimeReport CPU = 10.614243 Real = 14.052576

MemReport  ---------- Memory  Summary ---[base-10 MB]----
MemReport  VmPeak = 1306.82 VmHWM = 644.043

%MSG-s ArtException:  PostEndJob 24-Sep-2019 15:53:31 CDT ModuleEndJob
cet::exception caught in art
---- OtherArt BEGIN
  ---- EventProcessorFailure BEGIN
    EventProcessor: an exception occurred during current event processing
    ---- ScheduleExecutionFailure BEGIN
      Path: ProcessingStopped.
      ---- OtherArt BEGIN
        ---- InvalidRequest BEGIN
          No metadata has been stored in this Fragment.cet::exception going through module
        ---- InvalidRequest END
      ---- OtherArt END
      Exception going through path produce
    ---- ScheduleExecutionFailure END
  ---- EventProcessorFailure END
  ---- OtherArt BEGIN
    ---- FatalRootError BEGIN
      Fatal Root Error: @SUB=TTree::SetEntries
      Tree branches have different numbers of entries, eg raw::RDTimeStampraw::RawDigitvoidart::Assns_tpcrawdecoder_daq_RunRawDecoder. has 0 entries while EventAuxiliary has 50 entries.
    ---- FatalRootError END
  ---- OtherArt END
---- OtherArt END
%MSG
Art has completed and will exit with status 1.

#5 Updated by Thomas Junk 12 months ago

Yes, that's it!

#6 Updated by Lynn Garren 12 months ago

I have a test build of dunetpc, lbne_raw_data, and dune_raw_data with an untagged build of artdaq_core. This build is the same as artdaq_core v3_05_03, but with the changes in ups/product_deps needed to build for s91. This release (candidate) appears to be fine. I see no obvious complaints. However, Eric is also investigating. We will wait for the results of his tests before releasing a new artdaq_core. We have given ourselves a deadline of tomorrow afternoon.

#7 Updated by Eric Flumerfelt 12 months ago

If larsoft can process the entire file with v3_05_04, then I'm happy for now. In my own debugging, I have found an issue with how I upgrade Fragment headers in the Fragment::fragmentHeader() accessor. This code will have to be changed to avoid memory issues in the future, but this will be a more substantial change to artdaq_core than I'd like to do in a couple days' timescale.

My recommendation is to test with a few other data files from the same readout period and make sure that you don't see the following behavior:

Begin processing the 42nd record. run: 9701 subRun: 1 event: 3191 at 24-Sep-2019 17:24:27 CDT
***** Start of EventDump for event 3191 *****
TIMING fragments: 
  1) fragment ID 200 has type 5, timestamp 78462798422953032, metadata present true, and sizeBytes 80
ContainerCPUHITS fragments: 
  1) fragment ID 366 has type 232, timestamp 78462798422953032, metadata present true, and sizeBytes 160 (contents: type = 11, count = 1, missing data = false)
    1) fragment type 11, timestamp 78462798422953032, metadata present true, and sizeBytes 104
  2) fragment ID 368 has type 232, timestamp 78462798422953032, metadata present true, and sizeBytes 112 (contents: type = 11, count = 1, missing data = false)
    1) fragment type 11, timestamp 78462798422953032, metadata present true, and sizeBytes 56
  3) fragment ID 365 has type 232, timestamp 78462798422953032, metadata present true, and sizeBytes 112 (contents: type = 11, count = 1, missing data = false)
    1) fragment type 11, timestamp 78462798422953032, metadata present true, and sizeBytes 56
  4) fragment ID 356 has type 232, timestamp 78462798422953032, metadata present true, and sizeBytes 5840 (contents: type = 11, count = 1, missing data = false)
    1) fragment type 11, timestamp 78462798422953032, metadata present true, and sizeBytes 5784
  5) fragment ID 354 has type 232, timestamp 78462798422953032, metadata present true, and sizeBytes 4976 (contents: type = 11, count = 1, missing data = false)
    1) fragment type 11, timestamp 78462798422953032, metadata present true, and sizeBytes 4920
  6) fragment ID 362 has type 232, timestamp 78462798422953032, metadata present true, and sizeBytes 5648 (contents: type = 11, count = 1, missing data = false)
    1) fragment type 11, timestamp 78462798422953032, metadata present true, and sizeBytes 5592
  7) fragment ID 369 has type 232, timestamp 78462798422953032, metadata present true, and sizeBytes 5536 (contents: type = 11, count = 1, missing data = false)
    1) fragment type 11, timestamp 78462798422953032, metadata present true, and sizeBytes 5480
  8) fragment ID 355 has type 232, timestamp 78462798422953032, metadata present true, and sizeBytes 1584 (contents: type = 11, count = 1, missing data = false)
    1) fragment type 11, timestamp 78462798422953032, metadata present true, and sizeBytes 1528
  9) fragment ID 358 has type 232, timestamp 78462798422953032, metadata present true, and sizeBytes 112 (contents: type = 11, count = 1, missing data = false)
    1) fragment type 11, timestamp 78462798422953032, metadata present true, and sizeBytes 56
  10) fragment ID 364 has type 232, timestamp 78462798422953032, metadata present true, and sizeBytes 224 (contents: type = 11, count = 1, missing data = false)
    1) fragment type 11, timestamp 78462798422953032, metadata present true, and sizeBytes 168
  11) fragment ID 350 has type 232, timestamp 78462798422953032, metadata present true, and sizeBytes 4816 (contents: type = 11, count = 1, missing data = false)
    1) fragment type 11, timestamp 78462798422953032, metadata present true, and sizeBytes 4760
  12) fragment ID 357 has type 232, timestamp 78462798422953032, metadata present true, and sizeBytes 7584 (contents: type = 11, count = 1, missing data = false)
    1) fragment type 11, timestamp 78462798422953032, metadata present true, and sizeBytes 7528
  13) fragment ID 367 has type 232, timestamp 78462798422953032, metadata present true, and sizeBytes 7120 (contents: type = 11, count = 1, missing data = false)
    1) fragment type 11, timestamp 78462798422953032, metadata present true, and sizeBytes 7064
  14) fragment ID 359 has type 232, timestamp 78462798422953032, metadata present true, and sizeBytes 176 (contents: type = 11, count = 1, missing data = false)
    1) fragment type 11, timestamp 78462798422953032, metadata present true, and sizeBytes 120
  15) fragment ID 353 has type 232, timestamp 78462798422953032, metadata present true, and sizeBytes 112 (contents: type = 11, count = 1, missing data = false)
    1) fragment type 11, timestamp 78462798422953032, metadata present true, and sizeBytes 56
  16) fragment ID 360 has type 232, timestamp 78462798422953032, metadata present true, and sizeBytes 7424 (contents: type = 11, count = 1, missing data = false)
    1) fragment type 11, timestamp 78462798422953032, metadata present true, and sizeBytes 7368
  17) fragment ID 363 has type 232, timestamp 78462798422953032, metadata present true, and sizeBytes 160 (contents: type = 11, count = 1, missing data = false)
    1) fragment type 11, timestamp 78462798422953032, metadata present true, and sizeBytes 104
  18) fragment ID 351 has type 232, timestamp 78462798422953032, metadata present true, and sizeBytes 112 (contents: type = 11, count = 1, missing data = false)
    1) fragment type 11, timestamp 78462798422953032, metadata present true, and sizeBytes 56
  19) fragment ID 352 has type 232, timestamp 78462798422953032, metadata present true, and sizeBytes 5808 (contents: type = 11, count = 1, missing data = false)
    1) fragment type 11, timestamp 78462798422953032, metadata present true, and sizeBytes 5752
  20) fragment ID 361 has type 232, timestamp 78462798422953032, metadata present true, and sizeBytes 128 (contents: type = 11, count = 1, missing data = false)
    1) fragment type 11, timestamp 78462798422953032, metadata present true, and sizeBytes 72
ContainerFELIX fragments: 
  1) fragment ID 375 has type 232, timestamp 78462798422953032, metadata present true, and sizeBytes 1215064 (contents: type = 8, count = 1, missing data = false)
    1) fragment type 8, timestamp 78462798422953032, metadata present true, and sizeBytes 1215008
  2) fragment ID 386 has type 232, timestamp 78462798422953032, metadata present true, and sizeBytes 1228008 (contents: type = 8, count = 1, missing data = false)
    1) fragment type 8, timestamp 78462798422953032, metadata present true, and sizeBytes 1227952
  3) fragment ID 379 has type 232, timestamp 78462798422953032, metadata present true, and sizeBytes 1203624 (contents: type = 8, count = 1, missing data = false)
    1) fragment type 8, timestamp 78462798422953032, metadata present true, and sizeBytes 1203568
  4) fragment ID 385 has type 232, timestamp 78462798422953032, metadata present true, and sizeBytes 1233960 (contents: type = 8, count = 1, missing data = false)
    1) fragment type 8, timestamp 78462798422953032, metadata present true, and sizeBytes 1233904
  5) fragment ID 376 has type 232, timestamp 78462798422953032, metadata present true, and sizeBytes 1222832 (contents: type = 8, count = 1, missing data = false)
    1) fragment type 8, timestamp 78462798422953032, metadata present true, and sizeBytes 1222776
  6) fragment ID 389 has type 232, timestamp 78462798422953032, metadata present true, and sizeBytes 1248640 (contents: type = 8, count = 1, missing data = false)
    1) fragment type 8, timestamp 78462798422953032, metadata present true, and sizeBytes 1248584
  7) fragment ID 387 has type 232, timestamp 78462798422953032, metadata present true, and sizeBytes 1235896 (contents: type = 8, count = 1, missing data = false)
    1) fragment type 8, timestamp 78462798422953032, metadata present true, and sizeBytes 1235840
  8) fragment ID 349 has type 232, timestamp 78462798422953032, metadata present true, and sizeBytes 2795232 (contents: type = 8, count = 1, missing data = false)
    1) fragment type 8, timestamp 78462798422953032, metadata present true, and sizeBytes 2795176
  9) fragment ID 378 has type 232, timestamp 78462798422953032, metadata present true, and sizeBytes 1213112 (contents: type = 8, count = 1, missing data = false)
    1) fragment type 8, timestamp 78462798422953032, metadata present true, and sizeBytes 1213056
  10) fragment ID 384 has type 232, timestamp 78462798422953032, metadata present true, and sizeBytes 1222752 (contents: type = 8, count = 1, missing data = false)
    1) fragment type 8, timestamp 78462798422953032, metadata present true, and sizeBytes 1222696
  11) fragment ID 374 has type 232, timestamp 78462798422953032, metadata present true, and sizeBytes 1226856 (contents: type = 8, count = 1, missing data = false)
    1) fragment type 8, timestamp 78462798422953032, metadata present true, and sizeBytes 1226800
  12) fragment ID 388 has type 232, timestamp 78462798422953032, metadata present true, and sizeBytes 1220400 (contents: type = 8, count = 1, missing data = false)
    1) fragment type 8, timestamp 78462798422953032, metadata present true, and sizeBytes 1220344
  13) fragment ID 377 has type 232, timestamp 78462798422953032, metadata present true, and sizeBytes 1233040 (contents: type = 8, count = 1, missing data = false)
    1) fragment type 8, timestamp 78462798422953032, metadata present true, and sizeBytes 1232984
  14) fragment ID 344 has type 232, timestamp 78462798422953032, metadata present true, and sizeBytes 2795232 (contents: type = 8, count = 1, missing data = false)
    1) fragment type 8, timestamp 78462798422953032, metadata present true, and sizeBytes 2795176
  15) fragment ID 382 has type 232, timestamp 78462798422953032, metadata present true, and sizeBytes 1216728 (contents: type = 8, count = 1, missing data = false)
    1) fragment type 8, timestamp 78462798422953032, metadata present true, and sizeBytes 1216672
  16) fragment ID 373 has type 232, timestamp 78462798422953032, metadata present true, and sizeBytes 1207656 (contents: type = 8, count = 1, missing data = false)
    1) fragment type 8, timestamp 78462798422953032, metadata present true, and sizeBytes 1207600
  17) fragment ID 380 has type 232, timestamp 78462798422953032, metadata present true, and sizeBytes 1233784 (contents: type = 8, count = 1, missing data = false)
    1) fragment type 8, timestamp 78462798422953032, metadata present true, and sizeBytes 1233728
  18) fragment ID 370 has type 232, timestamp 78462798422953032, metadata present true, and sizeBytes 1227800 (contents: type = 8, count = 1, missing data = false)
    1) fragment type 8, timestamp 78462798422953032, metadata present true, and sizeBytes 1227744
  19) fragment ID 383 has type 232, timestamp 78462798422953032, metadata present true, and sizeBytes 1226296 (contents: type = 8, count = 1, missing data = false)
    1) fragment type 8, timestamp 78462798422953032, metadata present true, and sizeBytes 1226240
  20) fragment ID 372 has type 232, timestamp 78462798422953032, metadata present true, and sizeBytes 1249352 (contents: type = 8, count = 1, missing data = false)
    1) fragment type 8, timestamp 78462798422953032, metadata present true, and sizeBytes 1249296
  21) fragment ID 371 has type 232, timestamp 78462798422953032, metadata present true, and sizeBytes 1210288 (contents: type = 8, count = 1, missing data = false)
    1) fragment type 8, timestamp 78462798422953032, metadata present true, and sizeBytes 1210232
  22) fragment ID 381 has type 232, timestamp 78462798422953032, metadata present true, and sizeBytes 1237568 (contents: type = 8, count = 1, missing data = false)
    1) fragment type 8, timestamp 78462798422953032, metadata present true, and sizeBytes 1237512
  23) fragment ID 345 has type 232, timestamp 78462798422953032, metadata present true, and sizeBytes 2795232 (contents: type = 8, count = 1, missing data = false)
    1) fragment type 8, timestamp 78462798422953032, metadata present true, and sizeBytes 2795176
  24) fragment ID 340 has type 232, timestamp 78462798422953032, metadata present true, and sizeBytes 2795232 (contents: type = 8, count = 1, missing data = false)
    1) fragment type 8, timestamp 78462798422953032, metadata present true, and sizeBytes 2795176
  25) fragment ID 346 has type 232, timestamp 78462798422953032, metadata present true, and sizeBytes 2795232 (contents: type = 8, count = 1, missing data = false)
    1) fragment type 8, timestamp 78462798422953032, metadata present true, and sizeBytes 2795176
  26) fragment ID 341 has type 232, timestamp 78462798422953032, metadata present true, and sizeBytes 2795232 (contents: type = 8, count = 1, missing data = false)
    1) fragment type 8, timestamp 78462798422953032, metadata present true, and sizeBytes 2795176
  27) fragment ID 347 has type 232, timestamp 78462798422953032, metadata present true, and sizeBytes 2795232 (contents: type = 8, count = 1, missing data = false)
    1) fragment type 8, timestamp 78462798422953032, metadata present true, and sizeBytes 2795176
  28) fragment ID 342 has type 232, timestamp 78462798422953032, metadata present true, and sizeBytes 2795232 (contents: type = 8, count = 1, missing data = false)
    1) fragment type 8, timestamp 78462798422953032, metadata present true, and sizeBytes 2795176
  29) fragment ID 348 has type 232, timestamp 78462798422953032, metadata present true, and sizeBytes 2795232 (contents: type = 8, count = 1, missing data = false)
    1) fragment type 8, timestamp 78462798422953032, metadata present true, and sizeBytes 2795176
  30) fragment ID 343 has type 232, timestamp 78462798422953032, metadata present true, and sizeBytes 2795232 (contents: type = 8, count = 1, missing data = false)
    1) fragment type 8, timestamp 78462798422953032, metadata present true, and sizeBytes 2795176
ContainerTPC fragments: 
  1) fragment ID 121 has type 232, timestamp 78462798422953032, metadata present true, and sizeBytes 840920 (contents: type = 2, count = 1, missing data = false)
    1) fragment type 2, timestamp 78462798422953032, metadata present false, and sizeBytes 840864
  2) fragment ID 128 has type 232, timestamp 78462798422953032, metadata present true, and sizeBytes 797928 (contents: type = 2, count = 1, missing data = false)
    1) fragment type 2, timestamp 78462798422953032, metadata present false, and sizeBytes 797872
  3) fragment ID 125 has type 232, timestamp 78462798422953032, metadata present true, and sizeBytes 837632 (contents: type = 2, count = 1, missing data = false)
    1) fragment type 2, timestamp 78462798422953032, metadata present false, and sizeBytes 837576
  4) fragment ID 127 has type 232, timestamp 78462798422953032, metadata present true, and sizeBytes 780856 (contents: type = 2, count = 1, missing data = false)
    1) fragment type 2, timestamp 78462798422953032, metadata present false, and sizeBytes 780800
  5) fragment ID 126 has type 232, timestamp 78462798422953032, metadata present true, and sizeBytes 860728 (contents: type = 2, count = 1, missing data = false)
    1) fragment type 2, timestamp 78462798422953032, metadata present false, and sizeBytes 860672
  6) fragment ID 112 has type 232, timestamp 78462798422953032, metadata present true, and sizeBytes 840752 (contents: type = 2, count = 1, missing data = false)
    1) fragment type 2, timestamp 78462798422953032, metadata present false, and sizeBytes 840696
  7) fragment ID 123 has type 232, timestamp 78462798422953032, metadata present true, and sizeBytes 854648 (contents: type = 2, count = 1, missing data = false)
    1) fragment type 2, timestamp 78462798422953032, metadata present false, and sizeBytes 854592
  8) fragment ID 108 has type 232, timestamp 78462798422953032, metadata present true, and sizeBytes 836080 (contents: type = 2, count = 1, missing data = false)
    1) fragment type 2, timestamp 78462798422953032, metadata present false, and sizeBytes 836024
  9) fragment ID 129 has type 232, timestamp 78462798422953032, metadata present true, and sizeBytes 858272 (contents: type = 2, count = 1, missing data = false)
    1) fragment type 2, timestamp 78462798422953032, metadata present false, and sizeBytes 858216
  10) fragment ID 138 has type 232, timestamp 78462798422953032, metadata present true, and sizeBytes 875304 (contents: type = 2, count = 1, missing data = false)
    1) fragment type 2, timestamp 78462798422953032, metadata present false, and sizeBytes 875248
  11) fragment ID 124 has type 232, timestamp 78462798422953032, metadata present true, and sizeBytes 854768 (contents: type = 2, count = 1, missing data = false)
    1) fragment type 2, timestamp 78462798422953032, metadata present false, and sizeBytes 854712
  12) fragment ID 114 has type 232, timestamp 78462798422953032, metadata present true, and sizeBytes 847808 (contents: type = 2, count = 1, missing data = false)
    1) fragment type 2, timestamp 78462798422953032, metadata present false, and sizeBytes 847752
  13) fragment ID 120 has type 232, timestamp 78462798422953032, metadata present true, and sizeBytes 859184 (contents: type = 2, count = 1, missing data = false)
    1) fragment type 2, timestamp 78462798422953032, metadata present false, and sizeBytes 859128
  14) fragment ID 119 has type 232, timestamp 78462798422953032, metadata present true, and sizeBytes 775512 (contents: type = 2, count = 1, missing data = false)
    1) fragment type 2, timestamp 78462798422953032, metadata present false, and sizeBytes 775456
  15) fragment ID 110 has type 232, timestamp 78462798422953032, metadata present true, and sizeBytes 772528 (contents: type = 2, count = 1, missing data = false)
    1) fragment type 2, timestamp 78462798422953032, metadata present false, and sizeBytes 772472
  16) fragment ID 122 has type 232, timestamp 78462798422953032, metadata present true, and sizeBytes 862744 (contents: type = 2, count = 1, missing data = false)
    1) fragment type 2, timestamp 78462798422953032, metadata present false, and sizeBytes 862688
  17) fragment ID 117 has type 232, timestamp 78462798422953032, metadata present true, and sizeBytes 775816 (contents: type = 2, count = 1, missing data = false)
    1) fragment type 2, timestamp 78462798422953032, metadata present false, and sizeBytes 775760
  18) fragment ID 137 has type 232, timestamp 78462798422953032, metadata present true, and sizeBytes 767272 (contents: type = 2, count = 1, missing data = false)
    1) fragment type 2, timestamp 78462798422953032, metadata present false, and sizeBytes 767216
  19) fragment ID 118 has type 232, timestamp 78462798422953032, metadata present true, and sizeBytes 842200 (contents: type = 2, count = 1, missing data = false)
    1) fragment type 2, timestamp 78462798422953032, metadata present false, and sizeBytes 842144
  20) fragment ID 132 has type 232, timestamp 78462798422953032, metadata present true, and sizeBytes 822864 (contents: type = 2, count = 1, missing data = false)
    1) fragment type 2, timestamp 78462798422953032, metadata present false, and sizeBytes 822808
  21) fragment ID 116 has type 232, timestamp 78462798422953032, metadata present true, and sizeBytes 841736 (contents: type = 2, count = 1, missing data = false)
    1) fragment type 2, timestamp 78462798422953032, metadata present false, and sizeBytes 841680
  22) fragment ID 109 has type 232, timestamp 78462798422953032, metadata present true, and sizeBytes 823072 (contents: type = 2, count = 1, missing data = false)
    1) fragment type 2, timestamp 78462798422953032, metadata present false, and sizeBytes 823016
  23) fragment ID 111 has type 232, timestamp 78462798422953032, metadata present true, and sizeBytes 839000 (contents: type = 2, count = 1, missing data = false)
    1) fragment type 2, timestamp 78462798422953032, metadata present false, and sizeBytes 838944
  24) fragment ID 133 has type 232, timestamp 78462798422953032, metadata present true, and sizeBytes 829496 (contents: type = 2, count = 1, missing data = false)
    1) fragment type 2, timestamp 78462798422953032, metadata present false, and sizeBytes 829440
  25) fragment ID 115 has type 232, timestamp 78462798422953032, metadata present true, and sizeBytes 857712 (contents: type = 2, count = 1, missing data = false)
    1) fragment type 2, timestamp 78462798422953032, metadata present false, and sizeBytes 857656
  26) fragment ID 134 has type 232, timestamp 78462798422953032, metadata present true, and sizeBytes 830168 (contents: type = 2, count = 1, missing data = false)
    1) fragment type 2, timestamp 78462798422953032, metadata present false, and sizeBytes 830112
  27) fragment ID 113 has type 232, timestamp 78462798422953032, metadata present true, and sizeBytes 836392 (contents: type = 2, count = 1, missing data = false)
    1) fragment type 2, timestamp 78462798422953032, metadata present false, and sizeBytes 836336
  28) fragment ID 136 has type 232, timestamp 78462798422953032, metadata present true, and sizeBytes 834072 (contents: type = 2, count = 1, missing data = false)
    1) fragment type 2, timestamp 78462798422953032, metadata present false, and sizeBytes 834016
  29) fragment ID 139 has type 232, timestamp 78462798422953032, metadata present true, and sizeBytes 823160 (contents: type = 2, count = 1, missing data = false)
    1) fragment type 2, timestamp 78462798422953032, metadata present false, and sizeBytes 823104
  30) fragment ID 135 has type 232, timestamp 78462798422953032, metadata present true, and sizeBytes 3441032 (contents: type = 2, count = 1, missing data = false)
    1) fragment type 2, timestamp 78462798422953032, metadata present false, and sizeBytes 3440976
ContainerCTB fragments: 
  1) fragment ID 0 has type 232, timestamp 78462798422953032, metadata present true, and sizeBytes 272 (contents: type = 10, count = 5, missing data = false)
    1) fragment type 10, timestamp 78462798422619473, metadata present false, and sizeBytes 40
    2) fragment type 10, timestamp 78462798422744473, metadata present false, and sizeBytes 40
    3) fragment type 10, timestamp 78462798422869473, metadata present false, and sizeBytes 40
*** Error in `art': malloc(): memory corruption: 0x000000000b5b4c20 ***
======= Backtrace: =========
/lib64/libc.so.6(+0x82ec6)[0x7f208a4aeec6]
/lib64/libc.so.6(+0x833e6)[0x7f208a4af3e6]
/lib64/libc.so.6(__libc_memalign+0x75)[0x7f208a4b2405]
/lib64/libc.so.6(posix_memalign+0x4c)[0x7f208a4b408c]
/home/eflumerf/Desktop/artdaq-mrb-base/e19_s82_prof/build_slf7.x86_64/artdaq/lib/libartdaq_ArtModules_EventDump_module.so(_ZNK6artdaq8Fragment14fragmentHeaderEv+0x489)[0x7f2086823049]
/home/eflumerf/Desktop/artdaq-mrb-base/e19_s82_prof/build_slf7.x86_64/artdaq/lib/libartdaq_ArtModules_EventDump_module.so(_ZN6artdaq9EventDump7analyzeERKN3art5EventE+0x63a)[0x7f2086820c9a]
/home/eflumerf/products/art/v3_02_04/slf7.x86_64.e19.prof/lib/libart_Framework_Core.so(_ZN3art10EDAnalyzer16analyzeWithFrameERKNS_5EventERKNS_15ProcessingFrameE+0x38)[0x7f208da45808]
/home/eflumerf/products/art/v3_02_04/slf7.x86_64.e19.prof/lib/libart_Framework_Core.so(_ZN3art6detail8Analyzer7doEventERNS_14EventPrincipalERKNS_13ModuleContextERSt6atomicImES9_S9_+0x9c)[0x7f208dac2a7c]
/home/eflumerf/products/art/v3_02_04/slf7.x86_64.e19.prof/lib/libart_Framework_Principal.so(_ZN3art6Worker12doWork_eventERNS_14EventPrincipalERKNS_13ModuleContextE+0x56)[0x7f208d146fe6]
/home/eflumerf/products/art/v3_02_04/slf7.x86_64.e19.prof/lib/libart_Framework_Core.so(_ZN3art12WorkerInPath27runWorker_event_for_endpathERNS_14EventPrincipalE+0x444)[0x7f208dabe8c4]
/home/eflumerf/products/art/v3_02_04/slf7.x86_64.e19.prof/lib/libart_Framework_Core.so(_ZN3art4Path25process_event_for_endpathERNS_14EventPrincipalE+0x713)[0x7f208da8e8b3]
/home/eflumerf/products/art/v3_02_04/slf7.x86_64.e19.prof/lib/libart_Framework_Core.so(_ZN3art15EndPathExecutor13process_eventERNS_14EventPrincipalE+0x539)[0x7f208da494e9]
/home/eflumerf/products/art/v3_02_04/slf7.x86_64.e19.prof/lib/libart_Framework_EventProcessor.so(_ZN3art14EventProcessor17endPathRunnerTaskENS_10ScheduleIDEPN3tbb4taskE+0x29f)[0x7f208df57a0f]
/home/eflumerf/products/art/v3_02_04/slf7.x86_64.e19.prof/lib/libart_Framework_EventProcessor.so(_ZN3hep11concurrency10QueuedTaskIN3art20EndPathRunnerFunctorEE7executeEv+0x20)[0x7f208df7f0f0]
/home/eflumerf/products/tbb/v2019_3/Linux64bit+3.10-2.17-e19-prof/lib/libtbb.so.2(+0x26c85)[0x7f208c580c85]
/home/eflumerf/products/art/v3_02_04/slf7.x86_64.e19.prof/lib/libart_Framework_EventProcessor.so(_ZN3art14EventProcessor7processILNS_5LevelE4EEEvv+0x199)[0x7f208df54209]
/home/eflumerf/products/art/v3_02_04/slf7.x86_64.e19.prof/lib/libart_Framework_EventProcessor.so(_ZN3art14EventProcessor7processILNS_5LevelE3EEEvv+0x7d)[0x7f208df7f21d]
/home/eflumerf/products/art/v3_02_04/slf7.x86_64.e19.prof/lib/libart_Framework_EventProcessor.so(_ZN3art14EventProcessor7processILNS_5LevelE2EEEvv+0x7d)[0x7f208df7f3dd]
/home/eflumerf/products/art/v3_02_04/slf7.x86_64.e19.prof/lib/libart_Framework_EventProcessor.so(_ZN3art14EventProcessor7processILNS_5LevelE1EEEvv+0x7d)[0x7f208df7f6cd]
/home/eflumerf/products/art/v3_02_04/slf7.x86_64.e19.prof/lib/libart_Framework_EventProcessor.so(_ZN3art14EventProcessor7processILNS_5LevelE0EEEvv+0x85)[0x7f208df7f865]
/home/eflumerf/products/art/v3_02_04/slf7.x86_64.e19.prof/lib/libart_Framework_EventProcessor.so(_ZN3art14EventProcessor15runToCompletionEv+0x19)[0x7f208df5d669]
/home/eflumerf/products/art/v3_02_04/slf7.x86_64.e19.prof/lib/libart_Framework_Art.so(_ZN3art15run_art_common_ERKN5fhicl12ParameterSetE+0xccc)[0x7f208e6f657c]
/home/eflumerf/products/art/v3_02_04/slf7.x86_64.e19.prof/lib/libart_Framework_Art.so(_ZN3art7run_artEiPPcRN5boost15program_options19options_descriptionERN3cet14filepath_makerEOSt6vectorISt10unique_ptrINS_14OptionsHandlerESt14default_deleteISB_EESaISE_EE+0xc62)[0x7f208e6f86b2]
/home/eflumerf/products/art/v3_02_04/slf7.x86_64.e19.prof/lib/libart_Framework_Art.so(_Z6artappiPPc+0x360)[0x7f208e6f2d70]
art(main+0xc)[0x40127c]
/lib64/libc.so.6(__libc_start_main+0xf5)[0x7f208a44e545]
art[0x40133c]
======= Memory map: ========
00400000-00402000 r-xp 00000000 fd:02 4308695127                         /home/eflumerf/products/art/v3_02_04/slf7.x86_64.e19.prof/bin/art
00601000-00602000 r--p 00001000 fd:02 4308695127                         /home/eflumerf/products/art/v3_02_04/slf7.x86_64.e19.prof/bin/art
00602000-00603000 rw-p 00002000 fd:02 4308695127                         /home/eflumerf/products/art/v3_02_04/slf7.x86_64.e19.prof/bin/art
010cb000-10a30000 rw-p 00000000 00:00 0                                  [heap]
7f1f731c1000-7f1f731db000 rw-p 00000000 00:00 0 
7f1f731db000-7f1ff2ecb000 rw-s 00010000 fd:00 136693898                  /tmp/trace_buffer_eflumerf
7f1ff2ecb000-7f1ff2edb000 r--s 00000000 fd:00 136693898                  /tmp/trace_buffer_eflumerf
7f1ff2edb000-7f1ff6334000 rw-p 00000000 00:00 0 
7f1ff6334000-7f2076024000 rw-s 00010000 fd:00 136693898                  /tmp/trace_buffer_eflumerf
7f2076024000-7f2076034000 r--s 00000000 fd:00 136693898                  /tmp/trace_buffer_eflumerf
7f2076034000-7f207609b000 r-xp 00000000 fd:02 3233982464                 /home/eflumerf/products/canvas_root_io/v1_03_03/slf7.x86_64.e19.prof/lib/libcanvas_root_io_Dictionaries_std_dict.so
7f207609b000-7f207629b000 ---p 00067000 fd:02 3233982464                 /home/eflumerf/products/canvas_root_io/v1_03_03/slf7.x86_64.e19.prof/lib/libcanvas_root_io_Dictionaries_std_dict.so
7f207629b000-7f207629e000 r--p 00067000 fd:02 3233982464                 /home/eflumerf/products/canvas_root_io/v1_03_03/slf7.x86_64.e19.prof/lib/libcanvas_root_io_Dictionaries_std_dict.so    4) fragment type Aborted (core dumped)

#8 Updated by Thomas Junk 12 months ago

After some effort fussing with makefiles, I got an untagged artdaq_core built out of the head of develop and built corresponding dune_raw_data, lbne_raw_data, and dunetpc products. I ran RunRawDecoder.fcl on the suggested file, and the memory usage slowly crept up to 3.9 GB. It crashed on the same event with a similar message to what you saw.

Begin processing the 42nd record. run: 9701 subRun: 1 event: 3191 at 25-Sep-2019 00:45:56 CDT
%MSG-w MissingData:  CRTRawDecoder:crtrawdecoder@BeginModule  25-Sep-2019 00:46:02 CDT run: 9701 subRun: 1 event: 3191
No artdaq::Fragments produced by InputTag:  label = daq, instance = ContainerCRT in this event, so not doing anything.

 %MSG
*** Error in `lar': malloc(): memory corruption: 0x000000001897ca20 ***
======= Backtrace: =========
/lib64/libc.so.6(+0x82e16)[0x7ff4487b2e16]
/lib64/libc.so.6(+0x83336)[0x7ff4487b3336]
/lib64/libc.so.6(__libc_memalign+0x75)[0x7ff4487b5c15]
/lib64/libc.so.6(posix_memalign+0x4c)[0x7ff4487b789c]
/dune/app/users/trj/splitter10/localProducts_larsoft_v08_31_00_e17_prof/dunetpc/v08_31_00/slf7.x86_64.e17.prof/lib/libCRTRawDecoder_module.so(_ZNK6artdaq8Fragment14fragmentHeaderEv+0x385)[0x7ff434d6bb95]
/dune/app/users/trj/splitter10/localProducts_larsoft_v08_31_00_e17_prof/dune_raw_data/v1_17_36/slf7.x86_64.e17.nu.s91.prof/lib/libdune-raw-data_Overlays.so(_ZN4dune11CTBFragmentC2ERKN6artdaq8FragmentE+0x1b)[0x7ff4349fb28b]
/dune/app/users/trj/splitter10/localProducts_larsoft_v08_31_00_e17_prof/dunetpc/v08_31_00/slf7.x86_64.e17.prof/lib/libPDSPCTBRawDecoder_module.so(_ZN17PDSPCTBRawDecoder16_process_CTB_AUXERKN6artdaq8FragmentE+0x20)[0x7ff433ea4780]
/dune/app/users/trj/splitter10/localProducts_larsoft_v08_31_00_e17_prof/dunetpc/v08_31_00/slf7.x86_64.e17.prof/lib/libPDSPCTBRawDecoder_module.so(_ZN17PDSPCTBRawDecoder7produceERN3art5EventE+0x7b4)[0x7ff433ea5bb4]
/cvmfs/larsoft.opensciencegrid.org/products/art/v3_02_06/slf7.x86_64.e17.prof/lib/libart_Framework_Core.so(_ZN3art10EDProducer16produceWithFrameERNS_5EventERKNS_15ProcessingFrameE+0x38)[0x7ff44bf5e9a8]
/cvmfs/larsoft.opensciencegrid.org/products/art/v3_02_06/slf7.x86_64.e17.prof/lib/libart_Framework_Core.so(_ZN3art6detail8Producer7doEventERNS_14EventPrincipalERKNS_13ModuleContextERSt6atomicImES9_S9_+0x61)[0x7ff44bff5581]
/cvmfs/larsoft.opensciencegrid.org/products/art/v3_02_06/slf7.x86_64.e17.prof/lib/libart_Framework_Principal.so(_ZN3art6Worker9runWorkerERNS_14EventPrincipalERKNS_13ModuleContextE+0x28e)[0x7ff44baab4de]
/cvmfs/larsoft.opensciencegrid.org/products/art/v3_02_06/slf7.x86_64.e17.prof/lib/libart_Framework_Principal.so(+0x3e415)[0x7ff44baad415]
/cvmfs/larsoft.opensciencegrid.org/products/art/v3_02_06/slf7.x86_64.e17.prof/lib/libart_Framework_Principal.so(+0x3e5fc)[0x7ff44baad5fc]
/cvmfs/larsoft.opensciencegrid.org/products/tbb/v2019_3/Linux64bit+3.10-2.17-e17-prof/lib/libtbb.so.2(+0x270e6)[0x7ff44a8840e6]
/cvmfs/larsoft.opensciencegrid.org/products/art/v3_02_06/slf7.x86_64.e17.prof/lib/libart_Framework_EventProcessor.so(_ZN3art14EventProcessor7processILNS_5LevelE4EEEvv+0x175)[0x7ff44c262775]
/cvmfs/larsoft.opensciencegrid.org/products/art/v3_02_06/slf7.x86_64.e17.prof/lib/libart_Framework_EventProcessor.so(_ZN3art14EventProcessor7processILNS_5LevelE3EEEvv+0x7d)[0x7ff44c29181d]
/cvmfs/larsoft.opensciencegrid.org/products/art/v3_02_06/slf7.x86_64.e17.prof/lib/libart_Framework_EventProcessor.so(_ZN3art14EventProcessor7processILNS_5LevelE2EEEvv+0x7d)[0x7ff44c29198d]
/cvmfs/larsoft.opensciencegrid.org/products/art/v3_02_06/slf7.x86_64.e17.prof/lib/libart_Framework_EventProcessor.so(_ZN3art14EventProcessor7processILNS_5LevelE1EEEvv+0x7d)[0x7ff44c291c5d]
/cvmfs/larsoft.opensciencegrid.org/products/art/v3_02_06/slf7.x86_64.e17.prof/lib/libart_Framework_EventProcessor.so(_ZN3art14EventProcessor15runToCompletionEv+0xc5)[0x7ff44c26f6d5]
/cvmfs/larsoft.opensciencegrid.org/products/art/v3_02_06/slf7.x86_64.e17.prof/lib/libart_Framework_Art.so(_ZN3art15run_art_common_ERKN5fhicl12ParameterSetERKSt3mapINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEENS_6detail16ModuleKeyAndTypeESt4lessISA_ESaISt4pairIKSA_SC_EEE+0xd45)[0x7ff44ca0d1c5]
/cvmfs/larsoft.opensciencegrid.org/products/art/v3_02_06/slf7.x86_64.e17.prof/lib/libart_Framework_Art.so(_ZN3art7run_artEiPPcRN5boost15program_options19options_descriptionERN3cet14filepath_makerEOSt6vectorISt10unique_ptrINS_14OptionsHandlerESt14default_deleteISB_EESaISE_EE+0xe65)[0x7ff44ca0f925]
/cvmfs/larsoft.opensciencegrid.org/products/art/v3_02_06/slf7.x86_64.e17.prof/lib/libart_Framework_Art.so(_Z6artappiPPc+0x35e)[0x7ff44ca08cde]
lar(main+0xc)[0x40123c]
/lib64/libc.so.6(__libc_start_main+0xf5)[0x7ff448752495]
lar[0x4012fc]
======= Memory map: ========
00400000-00402000 r-xp 00000000 00:38 24237173                           /cvmfs/larsoft.opensciencegrid.org/products/art/v3_02_06/slf7.x86_64.e17.prof/bin/lar
00601000-00602000 r--p 00001000 00:38 24237173                           /cvmfs/larsoft.opensciencegrid.org/products/art/v3_02_06/slf7.x86_64.e17.prof/bin/lar
00602000-00603000 rw-p 00002000 00:38 24237173                           /cvmfs/larsoft.opensciencegrid.org/products/art/v3_02_06/slf7.x86_64.e17.prof/bin/lar
00ab0000-6d1c3000 rw-p 00000000 00:00 0                                  [heap]
7ff376a0a000-7ff3f07ff000 rw-p 00000000 00:00 0 
7ff3f0824000-7ff3f6c77000 rw-p 00000000 00:00 0 
7ff3f995d000-7ff414c7b000 rw-p 00000000 00:00 0 
7ff414c7b000-7ff414cba000 r-xp 00000000 00:38 24987440                   /cvmfs/larsoft.opensciencegrid.org/products/root/v6_18_04/Linux64bit+3.10-2.17-e17-prof/lib/libSmatrix.so
7ff414cba000-7ff414eba000 ---p 0003f000 00:38 24987440                   /cvmfs/larsoft.opensciencegrid.org/products/root/v6_18_04/Linux64bit+3.10-2.17-e17-prof/lib/libSmatrix.so
7ff414eba000-7ff414ebb000 r--p 0003f000 00:38 24987440                   /cvmfs/larsoft.opensciencegrid.org/products/root/v6_18_04/Linux64bit+3.10-2.17-e17-prof/lib/libSmatrix.so
7ff414ebb000-7ff414ebc000 rw-p 00040000 00:38 24987440                   /cvmfs/larsoft.opensciencegrid.org/products/root/v6_18_04/Linux64bit+3.10-2.17-e17-prof/lib/libSmatrix.so
7ff414ebc000-7ff414ec7000 rw-p 00000000 00:00 0 
7ff414ec7000-7ff4150c2000 r-xp 00000000 00:38 25093968                   /cvmfs/larsoft.opensciencegrid.org/products/lardataobj/v08_05_00/slf7.x86_64.e17.prof/lib/liblardataobj_RecoBase_TrackingDicts_dict.so
7ff4150c2000-7ff4152c2000 ---p 001fb000 00:38 25093968                   /cvmfs/larsoft.opensciencegrid.org/products/lardataobj/v08_05_00/slf7.x86_64.e17.prof/lib/liblardataobj_RecoBase_TrackingDicts_dict.so
7ff4152c2000-7ff4152cb000 r--p 001fb000 00:38 25093968                   /cvmfs/larsoft.opensciencegrid.org/products/lardataobj/v08_05_00/slf7.x86_64.e17.prof/lib/Aborted

#9 Updated by Thomas Junk 12 months ago

I ran the same file readin job lar -c RunRawDecoder.fcl root://fndca1.fnal.gov:1094/pnfs/fnal.gov/usr/dune/tape_backed/dunepro/protodune-sp/raw/2019/detector/test/None/00/00/97/01/np04_raw_run009701_0004_dl1.root

but with

dunetpc v08_28_01

which sets up

artdaq_core v3_04_20

and it completed successfully. It also didn't have any obvious memory leaks, with 1.1 GB resident memory for the whole run.

#10 Updated by Eric Flumerfelt 12 months ago

  • Priority changed from Normal to Urgent
  • Assignee set to Eric Flumerfelt
  • Status changed from New to Resolved
  • Category set to Known Issues

I have committed a fix to artdaq-core:bugfix/23319_Fragment_fragmentHeaderReturnType. I will try and get this changeset reviewed and approved today.

#11 Updated by Eric Flumerfelt 12 months ago

  • % Done changed from 0 to 100

#12 Updated by Eric Flumerfelt 12 months ago

  • Related to Support #23328: EventDump verbose mode unnecessary copies added

#13 Updated by Eric Flumerfelt 12 months ago

For testing, I have extracted the problematic event from the data file:

art -c extractEvent.fcl --nskip 41 -n 1 -o np04_raw_run009701_0004_dl1.root.copied.extracted np04_raw_run009701_0004_dl1.root.copied

Running art -c verboseRawEventDump.fcl np04_raw_run009701_0004_dl1.root.copied.extracted fails on artdaq-core's develop branch and succeeds on the bugfix branch. (Setup artdaq to get verboseRawEventDump.fcl in FHICL_FILE_PATH)

#14 Updated by Thomas Junk 12 months ago

I built artdaq_core from the branch artdaq-core:bugfix/23319_Fragment_fragmentHeaderReturnType and now I see different
problems reading ProtoDUNE-SP raw data. The memory leak appears to still be there -- an interactive process starts at about 1 GB
and inflates in size to 5 GB running over the 50 events in the file. I am also getting empty data vectors after our DAQ group's
unpacking software runs.

I ran 10 events with valgrind --leak-check=yes, and here's one of the suspicious blocks

==1274== Mismatched free() / delete / delete []
==1274==    at 0x4C2AF2D: free (vg_replace_malloc.c:530)
==1274==    by 0x395B80CC: ~QuickVec (QuickVec.hh:368)
==1274==    by 0x395B80CC: ~Fragment (Fragment.hh:85)
==1274==    by 0x395B80CC: _Destroy<artdaq::Fragment> (stl_construct.h:98)
==1274==    by 0x395B80CC: __destroy<artdaq::Fragment*> (stl_construct.h:108)
==1274==    by 0x395B80CC: _Destroy<artdaq::Fragment*> (stl_construct.h:137)
==1274==    by 0x395B80CC: _Destroy<artdaq::Fragment*, artdaq::Fragment> (stl_construct.h:206)
==1274==    by 0x395B80CC: ~vector (stl_vector.h:434)
==1274==    by 0x395B80CC: ~Wrapper (Wrapper.h:86)
==1274==    by 0x395B80CC: art::Wrapper<std::vector<artdaq::Fragment, std::allocator<artdaq::Fragment> > >::~Wrapper() (Wrapper.h:86)
==1274==    by 0x5E1B67E: art::Group::removeCachedProduct() (Group.cc:196)
==1274==    by 0x5E1D910: art::Principal::removeCachedProduct(art::ProductID) const (Principal.cc:485)
==1274==    by 0x2236AA0A: bool art::DataViewImpl::removeCachedProduct<std::vector<artdaq::Fragment, std::allocator<artdaq::Fragment> > >(art::Handle<std::vector<artdaq::Fragment, std::allocator<artdaq::Fragment> > >&) const (DataViewImpl.h:938)
==1274==    by 0x2235A998: PDSPTPCRawDecoder::_rceProcContNCFrags(art::Handle<std::vector<artdaq::Fragment, std::allocator<artdaq::Fragment> > >, unsigned long&, bool, art::Event&, std::vector<raw::RawDigit, std::allocator<raw::RawDigit> >&, std::vector<raw::RDTimeStamp, std::allocator<raw::RDTimeStamp> >&, art::Assns<raw::RawDigit, raw::RDTimeStamp, void>&, art::PtrMaker<raw::RawDigit>&, art::PtrMaker<raw::RDTimeStamp>&) (PDSPTPCRawDecoder_module.cc:498)
==1274==    by 0x2235B0A8: PDSPTPCRawDecoder::_processRCE(art::Event&, std::vector<raw::RawDigit, std::allocator<raw::RawDigit> >&, std::vector<raw::RDTimeStamp, std::allocator<raw::RDTimeStamp> >&, art::Assns<raw::RawDigit, raw::RDTimeStamp, void>&, art::PtrMaker<raw::RawDigit>&, art::PtrMaker<raw::RDTimeStamp>&) (PDSPTPCRawDecoder_module.cc:388)
==1274==    by 0x2235B7CC: PDSPTPCRawDecoder::produce(art::Event&) (PDSPTPCRawDecoder_module.cc:323)
==1274==    by 0x58DF9A7: art::EDProducer::produceWithFrame(art::Event&, art::ProcessingFrame const&) (EDProducer.cc:88)
==1274==    by 0x5976580: art::detail::Producer::doEvent(art::EventPrincipal&, art::ModuleContext const&, std::atomic<unsigned long>&, std::atomic<unsigned long>&, std::atomic<unsigned long>&) (Producer.cc:121)
==1274==    by 0x5E2B4DD: art::Worker::runWorker(art::EventPrincipal&, art::ModuleContext const&) (Worker.cc:591)
==1274==    by 0x5E2D414: operator() (Worker.cc:569)
==1274==    by 0x5E2D414: void hep::concurrency::SerialTaskQueueChain::runFunc<art::(anonymous namespace)::RunWorkerFunctor>(art::(anonymous namespace)::RunWorkerFunctor const&) (SerialTaskQueueChain.h:83)
==1274==  Address 0x41ca8a00 is 0 bytes inside a block of size 797,320 alloc'd
==1274==    at 0x4C2AAF8: operator new[](unsigned long) (vg_replace_malloc.c:423)
==1274==    by 0x1321AEAC: int TStreamerInfo::ReadBuffer<char**>(TBuffer&, char** const&, TStreamerInfo::TCompInfo* const*, int, int, int, int, int) (in /cvmfs/larsoft.opensciencegrid.org/products/root/v6_18_04/Linux64bit+3.10-2.17-e17-prof/lib/libRIO.so)
==1274==    by 0x130FAADA: TStreamerInfoActions::VectorLooper::GenericRead(TBuffer&, void*, void const*, TStreamerInfoActions::TLoopConfiguration const*, TStreamerInfoActions::TConfiguration const*) (in /cvmfs/larsoft.opensciencegrid.org/products/root/v6_18_04/Linux64bit+3.10-2.17-e17-prof/lib/libRIO.so)
==1274==    by 0x13078E57: TBufferFile::ApplySequence(TStreamerInfoActions::TActionSequence const&, void*, void*) (in /cvmfs/larsoft.opensciencegrid.org/products/root/v6_18_04/Linux64bit+3.10-2.17-e17-prof/lib/libRIO.so)
==1274==    by 0x148BD23C: TBranchElement::ReadLeavesCollectionMember(TBuffer&) (in /cvmfs/larsoft.opensciencegrid.org/products/root/v6_18_04/Linux64bit+3.10-2.17-e17-prof/lib/libTree.so)
==1274==    by 0x148B605C: TBranch::GetEntry(long long, int) (in /cvmfs/larsoft.opensciencegrid.org/products/root/v6_18_04/Linux64bit+3.10-2.17-e17-prof/lib/libTree.so)
==1274==    by 0x148C008C: TBranchElement::GetEntry(long long, int) (in /cvmfs/larsoft.opensciencegrid.org/products/root/v6_18_04/Linux64bit+3.10-2.17-e17-prof/lib/libTree.so)
==1274==    by 0x148BFEE8: TBranchElement::GetEntry(long long, int) (in /cvmfs/larsoft.opensciencegrid.org/products/root/v6_18_04/Linux64bit+3.10-2.17-e17-prof/lib/libTree.so)
==1274==    by 0x148BFEE8: TBranchElement::GetEntry(long long, int) (in /cvmfs/larsoft.opensciencegrid.org/products/root/v6_18_04/Linux64bit+3.10-2.17-e17-prof/lib/libTree.so)
==1274==    by 0x141731CA: art::input::getEntry(TBranch*, long long) (getEntry.cc:24)
==1274==    by 0x19CC4D60: operator()<long long int> (RootDelayedReader.cc:162)
==1274==    by 0x19CC4D60: art::RootDelayedReader::getProduct_(art::Group const*, art::ProductID, art::RangeSet&) const (RootDelayedReader.cc:170)
==1274==    by 0x69BF221: art::DelayedReader::getProduct(art::Group const*, art::ProductID, art::RangeSet&) const (DelayedReader.cc:19)
==1274== 

#15 Updated by Pengfei Ding 12 months ago

I tested dunetpc develop on the raw file np04_raw_run009701_0004_dl1.root.

-bash-4.1$ git describe --tags #v08_31_00-10-gb947dc829

It sets up artdaq_core v3_04_20. The raw decoder job completed successfully with resident memory usage not going above 1.18GB.

I am now moving to artdaq_core v3_05_02 and then the bugfix branch.

#16 Updated by Pengfei Ding 12 months ago

Tested with artdaq_demo v3_06_01 (which sets up artdaq_core v3_05_03). Confirmed the error saw by Eric.

Checked out artdaq_core develop and rebuilt the demo, seeing the same memory error.

Checked the feature branch with the bugfix. I am no longer seeing the error. Going to test on a few other raw files from the same run.

    1) fragment type 2, timestamp 78462798793767839, and sizeBytes 844384
  24) fragment ID 111 has type 232, timestamp 78462798793767839, and sizeBytes 842768 (contents: type = 2, count = 1, missing data = 0)
    1) fragment type 2, timestamp 78462798793767839, and sizeBytes 842712
  25) fragment ID 109 has type 232, timestamp 78462798793767839, and sizeBytes 843528 (contents: type = 2, count = 1, missing data = 0)
    1) fragment type 2, timestamp 78462798793767839, and sizeBytes 843472
  26) fragment ID 136 has type 232, timestamp 78462798793767839, and sizeBytes 857400 (contents: type = 2, count = 1, missing data = 0)
    1) fragment type 2, timestamp 78462798793767839, and sizeBytes 857344
  27) fragment ID 135 has type 232, timestamp 78462798793767839, and sizeBytes 3441032 (contents: type = 2, count = 1, missing data = 0)
    1) fragment type 2, timestamp 78462798793767839, and sizeBytes 3440976
  28) fragment ID 138 has type 232, timestamp 78462798793767839, and sizeBytes 897992 (contents: type = 2, count = 1, missing data = 0)
    1) fragment type 2, timestamp 78462798793767839, and sizeBytes 897936
  29) fragment ID 112 has type 232, timestamp 78462798793767839, and sizeBytes 780024 (contents: type = 2, count = 1, missing data = 0)
    1) fragment type 2, timestamp 78462798793767839, and sizeBytes 779968
  30) fragment ID 122 has type 232, timestamp 78462798793767839, and sizeBytes 800632 (contents: type = 2, count = 1, missing data = 0)
    1) fragment type 2, timestamp 78462798793767839, and sizeBytes 800576
TIMING fragments:
  1) fragment ID 200 has type 5, timestamp 78462798793767839, and sizeBytes 80
***** End of EventDump for event 3303 *****
PRINCIPAL TYPE: Event
PROCESS NAME | MODULE_LABEL.. | PRODUCT INSTANCE NAME | DATA PRODUCT TYPE............ | PRODUCT FRIENDLY TYPE | SIZE
DAQ......... | daq........... | Container............ | std::vector<artdaq::Fragment> | artdaq::Fragments.... | ...0
DAQ......... | daq........... | TIMING............... | std::vector<artdaq::Fragment> | artdaq::Fragments.... | ...1
DAQ......... | daq........... | ContainerCPUHITS..... | std::vector<artdaq::Fragment> | artdaq::Fragments.... | ..20
DAQ......... | daq........... | ContainerFELIX....... | std::vector<artdaq::Fragment> | artdaq::Fragments.... | ..30
DAQ......... | daq........... | ContainerTPC......... | std::vector<artdaq::Fragment> | artdaq::Fragments.... | ..30
DAQ......... | TriggerResults | ..................... | art::TriggerResults.......... | art::TriggerResults.. | ...-
DAQ......... | daq........... | ContainerCTB......... | std::vector<artdaq::Fragment> | artdaq::Fragments.... | ...1
DAQ......... | daq........... | ContainerPHOTON...... | std::vector<artdaq::Fragment> | artdaq::Fragments.... | ..23

Total products (present, not present): 38 (8, 30).

PRINCIPAL TYPE: Run
PROCESS NAME | MODULE_LABEL | PRODUCT INSTANCE NAME | DATA PRODUCT TYPE.................... | PRODUCT FRIENDLY TYPE.... | SIZE
DAQ......... | BuildInfo... | DuneArtdaq........... | std::vector<artdaq::PackageBuildInfo> | artdaq::PackageBuildInfos | ...4

Total products (present, not present): 1 (1, 0).

25-Sep-2019 14:29:12 CDT  Closed input file "np04_raw_run009701_0004_dl1.root" 

TrigReport ---------- Event  Summary ------------
TrigReport Events total = 50 passed = 50 failed = 0

TrigReport ------ Modules in End-Path: end_path ------------
TrigReport  Trig Bit#        Run    Success      Error Name
TrigReport     0    0         50         50          0 evtDump
TrigReport     0    0         50         50          0 printBuildInfo
TrigReport     0    0         50         50          0 out1

TimeReport ---------- Time  Summary ---[sec]----
TimeReport CPU = 11.260950 Real = 11.271888

MemReport  ---------- Memory  Summary ---[base-10 MB]----
MemReport  VmPeak = 783.954 VmHWM = 424.706

Art has completed and will exit with status 0

#17 Updated by Pengfei Ding 12 months ago

Tested on another two raw files in the same run. Confirmed that the memory error no longer presents.

np04_raw_run009701_0003_dl12.root
np04_raw_run009701_0005_dl1.root

#18 Updated by Thomas Junk 12 months ago

Regarding the RCE unpacker failing with the bugfixed version, I spotted something in dune-raw-data/Overlays/RceFragment.cc

dune::RceFragment::RceFragment(artdaq::Fragment const& afrag) :
_data_ptr((uint64_t const*)(afrag.dataBeginBytes() + 12))

{
_init();
}

That hardcoded 12 in there was added by user np04 daq two years ago. It might have changed now that the headers are rewritten.

#19 Updated by Pengfei Ding 12 months ago

Got segfault when running the raw decoder. I am using:

dunetpc develop HEAD
artdaq_core develop HEAD (includes the bugfix)

-bash-4.2$ ups active|grep dunetpc
artdaq_core       v3_05_04        -f Linux64bit+3.10-2.17 -q e17:prof:s91    -z /home/dingpf/dunetpc_dev/localProducts_larsoft_v08_31_00_e17_prof
dunetpc           v08_31_00       -f Linux64bit+3.10-2.17 -q e17:prof        -z /home/dingpf/dunetpc_dev/localProducts_larsoft_v08_31_00_e17_prof
-bash-4.2$ pwd
/home/dingpf/dunetpc_dev
-bash-4.2$ cd srcs/dunetpc/
-bash-4.2$ git describe --tags
v08_31_00-10-gb947dc82
-bash-4.2$ cd ../artdaq_core/
-bash-4.2$ git describe --tags
v3_05_04
-bash-4.2$ git branch
* develop
  master

Trying to run the raw decoder with

valgrind --log-file=valgrind2.out --suppressions=${ROOTSYS}/etc/valgrind-root.supp lar -c  RunRawDecoder.fcl  np04_raw_run009701_0004_dl1.root

I got the segfault once starting processing the first event. Valgrind showed the backtrace as the following:

==244591==    by 0x70390E5: tbb::internal::custom_scheduler<tbb::internal::IntelSchedulerTraits>::local_wait_for_all(tbb::task&, tbb::task*) (custom_scheduler.h:521)
==244591==  Address 0x116c17f7153d9e5 is not stack'd, malloc'd or (recently) free'd
==244591==
==244591==
==244591== Process terminating with default action of signal 11 (SIGSEGV)
==244591==  General Protection Fault
==244591==    at 0x20914A3B: upgrade (RawFragmentHeaderV1.hh:230)
==244591==    by 0x20914A3B: artdaq::Fragment::fragmentHeader() const (Fragment.hh:1255)
==244591==    by 0x20B69800: dataBegin (Fragment.hh:1105)
==244591==    by 0x20B69800: dataBeginBytes (Fragment.hh:599)
==244591==    by 0x20B69800: dune::RceFragment::RceFragment(artdaq::Fragment const&) (RceFragment.cc:18)
==244591==    by 0x2201F81C: PDSPTPCRawDecoder::_process_RCE_nticksvf(artdaq::Fragment const&, std::vector<unsigned long, std::allocator<unsigned long> >&) (PDSPTPCRawDecoder_module.cc:583)
==244591==    by 0x22020393: PDSPTPCRawDecoder::_rceProcContNCFrags(art::Handle<std::vector<artdaq::Fragment, std::allocator<artdaq::Fragment> > >, unsigned long&, bool, art::Event&, std::vector<raw::RawDigit, std::allocator<raw::RawDigit> >&, std::vector<raw::RDTimeStamp, std::allocator<raw::RDTimeStamp> >&, art::Assns<raw::RawDigit, raw::RDTimeStamp, void>&, art::PtrMaker<raw::RawDigit>&, art::PtrMaker<raw::RDTimeStamp>&) (PDSPTPCRawDecoder_module.cc:484)
==244591==    by 0x22021008: PDSPTPCRawDecoder::_processRCE(art::Event&, std::vector<raw::RawDigit, std::allocator<raw::RawDigit> >&, std::vector<raw::RDTimeStamp, std::allocator<raw::RDTimeStamp> >&, art::Assns<raw::RawDigit, raw::RDTimeStamp, void>&, art::PtrMaker<raw::RawDigit>&, art::PtrMaker<raw::RDTimeStamp>&) (PDSPTPCRawDecoder_module.cc:388)
==244591==    by 0x2202172C: PDSPTPCRawDecoder::produce(art::Event&) (PDSPTPCRawDecoder_module.cc:323)
==244591==    by 0x58DF9A7: art::EDProducer::produceWithFrame(art::Event&, art::ProcessingFrame const&) (EDProducer.cc:88)
==244591==    by 0x5976580: art::detail::Producer::doEvent(art::EventPrincipal&, art::ModuleContext const&, std::atomic<unsigned long>&, std::atomic<unsigned long>&, std::atomic<unsigned long>&) (Producer.cc:121)
==244591==    by 0x5E2B4DD: art::Worker::runWorker(art::EventPrincipal&, art::ModuleContext const&) (Worker.cc:591)
==244591==    by 0x5E2D414: operator() (Worker.cc:569)
==244591==    by 0x5E2D414: void hep::concurrency::SerialTaskQueueChain::runFunc<art::(anonymous namespace)::RunWorkerFunctor>(art::(anonymous namespace)::RunWorkerFunctor const&) (SerialTaskQueueChain.h:83)
==244591==    by 0x5E2D5FB: operator() (SerialTaskQueueChain.h:69)
==244591==    by 0x5E2D5FB: hep::concurrency::QueuedTask<void hep::concurrency::SerialTaskQueueChain::passDown<art::(anonymous namespace)::RunWorkerFunctor>(unsigned int, art::(anonymous namespace)::RunWorkerFunctor const&)::{lambda()#1}>::execute() (SerialTaskQueue.h:87)
==244591==    by 0x70390E5: tbb::internal::custom_scheduler<tbb::internal::IntelSchedulerTraits>::local_wait_for_all(tbb::task&, tbb::task*) (custom_scheduler.h:521)
==244591==

#20 Updated by Thomas Junk 12 months ago

I checked the definition of dataBeginBytes() in Fragment.hh and indeed it has the header and metadata offsets in it already.
I don't know what the 12 is.

#21 Updated by Pengfei Ding 12 months ago

I built lbne_raw_data and dune_raw_data with the new artdaq_core v3_05_04 and tested the raw decoder job. It now completes successfully.

My working directory is on mu2edaq01.fnal.gov at /home/dingpf/dunetpc_dev

I had to make a few changes in lbne_raw_data's CMakeList.txt and product_deps to make things built.

-bash-4.2$ cd lbne_raw_data/
-bash-4.2$ git diff
diff --git a/CMakeLists.txt b/CMakeLists.txt
index c0daa54..32153e9 100644
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -47,9 +47,11 @@ cet_report_compiler_flags()

 # these are minimum required versions, not the actual product versions
 find_ups_product( cetbuildtools v3_03_00 )
-find_ups_product( artdaq_core v1_00_00 )
+find_ups_product( artdaq_core v3_05_03 )
 find_ups_product( art v1_06_00 )
+find_ups_root()
 find_ups_product( cetlib v1_03_00 )
+find_ups_product( TRACE v3_14_02 )
 find_ups_boost( v1_50_0 filesystem
                         program_options
                         unit_test_framework
@@ -69,7 +71,7 @@ include(CetTest)
 add_subdirectory(lbne-raw-data)

 # testing
-add_subdirectory(test)
+#add_subdirectory(test)

 # # tools
 # add_subdirectory(tools)
diff --git a/ups/product_deps b/ups/product_deps
index 255fdb0..27ab3db 100644
--- a/ups/product_deps
+++ b/ups/product_deps
@@ -23,7 +23,7 @@ table_fragment_end

 #
 product                version         optional
-artdaq_core    v3_05_02        nu
+artdaq_core    v3_05_04        nu
 art_root_io    v1_00_09        nu
 cetbuildtools  v7_11_00        nu      only_for_build
 end_product_list
-bash-4.2$ git describe --tags
v1_04_37
-bash-4.2$ cd ../dune_raw_data/; git describe --tags
v1_17_36
-bash-4.2$ git diff
diff --git a/ups/product_deps b/ups/product_deps
index a8dd95e..15df7f9 100644
--- a/ups/product_deps
+++ b/ups/product_deps
@@ -31,7 +31,7 @@ artdaq_core   v3_01_04        online_monitoring
 art            v2_07_03        online_monitoring
 TRACE          v3_13_04        online_monitoring
 cetbuildtools  v5_08_01        online_monitoring       only_for_build
-artdaq_core    v3_05_02        nu
+artdaq_core    v3_05_04        nu
 art            v3_02_06        nu
 cetbuildtools  v7_11_00        nu      only_for_build
 dunepdsprce     v1_1_0

#22 Updated by Pengfei Ding 12 months ago

Tom has just deployed lbne_raw_data v1_04_38 and dune_raw_data v1_17_37 to cvmfs.

I tested the dunetpc develop HEAD with these new packages together with artdaq_core v3_05_04 and confirmed that the raw decoder job succeeds.

#23 Updated by Thomas Junk 12 months ago

Thanks much!

#24 Updated by Thomas Junk 12 months ago

By "succeeds" are there messages saying "no valid nticks to check, discarding data"

#25 Updated by Pengfei Ding 12 months ago

Yeah, there are. "Succeeds" meant "art exit with 0" to me.

%MSG
Begin processing the 49th record. run: 9701 subRun: 1 event: 3291 at 25-Sep-2019 23:33:20 CDT
%MSG-w _process_RCE::  PDSPTPCRawDecoder:tpcrawdecoder@BeginModule  25-Sep-2019 23:33:20 CDT run: 9701 subRun: 1 event: 3291 PDSPTPCRawDecoder_module.cc:495
 No valid nticks to check.  Discarding Event.
%MSG
%MSG-w MissingData:  CRTRawDecoder:crtrawdecoder@BeginModule  25-Sep-2019 23:33:22 CDT run: 9701 subRun: 1 event: 3291
No artdaq::Fragments produced by InputTag:  label = daq, instance = ContainerCRT in this event, so not doing anything.

%MSG
Begin processing the 50th record. run: 9701 subRun: 1 event: 3303 at 25-Sep-2019 23:33:22 CDT
%MSG-w _process_RCE::  PDSPTPCRawDecoder:tpcrawdecoder@BeginModule  25-Sep-2019 23:33:22 CDT run: 9701 subRun: 1 event: 3303 PDSPTPCRawDecoder_module.cc:495
 No valid nticks to check.  Discarding Event.
%MSG
%MSG-w MissingData:  CRTRawDecoder:crtrawdecoder@BeginModule  25-Sep-2019 23:33:24 CDT run: 9701 subRun: 1 event: 3303
No artdaq::Fragments produced by InputTag:  label = daq, instance = ContainerCRT in this event, so not doing anything.

%MSG
25-Sep-2019 23:33:24 CDT  Closed output file "np04_raw_run009701_0004_dl1_decode.root" 
25-Sep-2019 23:33:24 CDT  Closed input file "../dunetpc_dev/np04_raw_run009701_0004_dl1.root" 

================================================================================================================================
TimeTracker printout (sec)                        Min           Avg           Max         Median          RMS         nEvts
================================================================================================================================
Full event                                      2.01078       2.06494       2.34265       2.04768      0.0588257       50
--------------------------------------------------------------------------------------------------------------------------------
source:RootInput(read)                        0.000202785   0.000922186    0.033572     0.000232691    0.0046648       50
produce:timingrawdecoder:TimingRawDecoder     0.000103779   0.000164541   0.00178855    0.000113652   0.000240787      50
produce:ssprawdecoder:SSPRawDecoder            0.0212439     0.0303392     0.0402175     0.0298901    0.00348955       50
produce:tpcrawdecoder:PDSPTPCRawDecoder         1.93098       1.97091       2.1284        1.95585      0.0434832       50
produce:crtrawdecoder:CRTRawDecoder           0.000118519   0.000149353   0.000482879   0.000137552   5.09082e-05      50
produce:ctbrawdecoder:PDSPCTBRawDecoder       0.000145811   0.000219731    0.0011621    0.00016682    0.000186377      50
produce:beamevent:BeamEvent                   4.8839e-05    7.49702e-05   0.000848576   5.33945e-05   0.000113245      50
[art]:TriggerResults:TriggerResultInserter    1.4424e-05    1.77472e-05   6.9798e-05    1.6082e-05    7.96976e-06      50
end_path:out1:RootOutput                       3.88e-06     5.57404e-06   1.3557e-05     4.795e-06    2.35868e-06      50
end_path:out1:RootOutput(write)                0.0423392     0.0618354     0.238629      0.0568301     0.0265925       50
================================================================================================================================

TrigReport ---------- Event  Summary ------------
TrigReport Events total = 50 passed = 50 failed = 0

TrigReport ------ Modules in End-Path: end_path ------------
TrigReport  Trig Bit#        Run    Success      Error Name
TrigReport     0    0         50         50          0 out1

TimeReport ---------- Time  Summary ---[sec]----
TimeReport CPU = 104.598304 Real = 105.201557

MemReport  ---------- Memory  Summary ---[base-10 MB]----
MemReport  VmPeak = 5394.98 VmHWM = 4745.63

Art has completed and will exit with status 0.

#26 Updated by Tingjun Yang 12 months ago

Pengfei Ding wrote:

Yeah, there are. "Succeeds" meant "art exit with 0" to me.

[...]

Thanks. Memory usage seems high.

#27 Updated by Pengfei Ding 12 months ago

Attached file contains the valgrind output with

valgrind --leak-check=yes --tool=memcheck --track-origins=yes --num-callers=50 --log-file=valgrind.out --suppressions=${ROOTSYS}/etc/valgrind-root.supp lar -c  RunRawDecoder.fcl  np04_raw_run009701_0004_dl1.root -n 2

Peak resident memory usage was 2994.43 MB.

I have another valgrind job running over the full 50 events in the raw file (expect to finish in ~40 minutes.

#28 Updated by Pengfei Ding 12 months ago

Attached the valgrind output for the full 50 events in the file. (Possibly lost ~4.07GB. )

The issue is likely to be in line 961-962 of dunetpc/dune/Protodune/singlephase/RawDecoding/PDSPTPCRawDecoder_module.cc

==223821== LEAK SUMMARY:
==223821==    definitely lost: 5,745,989 bytes in 38,317 blocks
==223821==    indirectly lost: 137,621 bytes in 164 blocks
==223821==      possibly lost: 4,068,422,379 bytes in 54,396 blocks
==223821==    still reachable: 266,349,295 bytes in 176,181 blocks
==223821==                       of which reachable via heuristic:
==223821==                         newarray           : 95,704 bytes in 187 blocks
==223821==                         multipleinheritance: 6,768 bytes in 9 blocks
==223821==         suppressed: 1,365,831 bytes in 15,879 blocks
==223821== Reachable blocks (those to which a pointer was found) are not shown.
==223821== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==223821==
==223821== For counts of detected and suppressed errors, rerun with: -v
==223821== ERROR SUMMARY: 268772 errors from 217 contexts (suppressed: 487514 from 1738)

#29 Updated by Thomas Junk 12 months ago

Thanks for the news; that's what I saw in a test and thought I had just built dune_raw_data from the bugfix branch incorrectly. Looks like there will need to be some debugging, comparing a working job with this one side by side. The no valid nticks to check message comes from the RCE decoder.

#30 Updated by Thomas Junk 12 months ago

I spotted the trouble with failing to unpack the RCE fragments. There's a check on the integrity of the fragment that needs to look at the header and the trailer, and it was shifted relative to a hardcoded offset, which, when updated, allowed the integrity check to pass. I'll ask Eric what mixture of exposed offset methods is supported to get the size I need to do the check more robustly against changes in the future.

There's still a memory leak however.

#31 Updated by Thomas Junk 12 months ago

Commenting out the removeCachedProduct calls doesn't change the memory leak.

#32 Updated by Thomas Junk 12 months ago

I commented out the _processFELIX call and it still leaks, just more slowly.

#33 Updated by Thomas Junk 12 months ago

I trimmed the raw decoder module down to just two getByLabel calls, one for RCE container fragments, the other for FELIX container fragments. It runs very quickly, but still leaks memory.

#34 Updated by Thomas Junk 12 months ago

And replacing getByLabel calls with getValidHandle calls changes nothing -- leaks memory very quickly. No artdaq_core calls involved, just art.

#35 Updated by Pengfei Ding 12 months ago

Thomas Junk wrote:

And replacing getByLabel calls with getValidHandle calls changes nothing -- leaks memory very quickly. No artdaq_core calls involved, just art.

Do you have a backtrace of using getValidHandle to compare with getByLabel? The common downstream call might indicate the origin of the memory leak.

#36 Updated by Pengfei Ding 12 months ago

Tried to run the Raw Decoder with memory debugging in DDT. It got segfault because "address not mapped to object". See attached screenshots for details. Problem seems to be with line 24 of getEntry.cc in art_root_io package.

#37 Updated by Kyle Knoepfel 12 months ago

I have started to investigate.

#38 Updated by Thomas Junk 12 months ago

Here's my super-stripped-down PDSPTPCRawDecoder_module.cc:

/dune/app/users/trj/splitter10/srcs/dunetpc/dune/Protodune/singlephase/RawDecoding/PDSPTPCRawDecoder_module.cc

which just has the two getByLabel calls and makes an empty RawDigit collection in the event (maybe even that can go, but it's a producer module so I figured it would have to produce something). I was going to rename it and put it on Tingjun's test branch. But it would be good to see if memory leaks happen when reading in other data products in other modules.

#39 Updated by Eric Flumerfelt 12 months ago

  • Related to Bug #23345: Incorrect reporting of data size from Fragments in Containers when upgraded added

#40 Updated by Pengfei Ding 12 months ago

  • Status changed from Resolved to Reviewed

#41 Updated by Kyle Knoepfel 12 months ago

Is this issue truly resolved? Or is there still an inexplicable memory growth?

#42 Updated by Tingjun Yang 12 months ago

This issue is not resolved. There is still a memory leak somewhere when you call getByLabel or getValidHandle.

#43 Updated by Kyle Knoepfel 12 months ago

Then its status should be changed. Can somebody from artdaq place it back into Assigned state?

#44 Updated by Eric Flumerfelt 12 months ago

I would say that the memory issue is completely separate from the original issue. It might be a good idea at this point to open a new issue (especially as it seems to be affecting a different package) and re-post the conversation from this issue.

#45 Updated by Kyle Knoepfel 12 months ago

For the sake of the users, can you create that new issue, Eric, and relate it to this one? I am debugging the memory growth. The significant growth definitely occurs while attempting to read artdaq::Fragments from disk--valgrind points to substantial memory leaks during that phase.

#46 Updated by Eric Flumerfelt 12 months ago

  • Related to Bug #23348: Memory leaks while reading artdaq Fragments from data files added

#47 Updated by Eric Flumerfelt 12 months ago

Discussion/debugging of the memory issue continues in Issue #23348.

#48 Updated by Eric Flumerfelt 12 months ago

  • Target version set to artdaq_core v3_05_07
  • Status changed from Reviewed to Closed


Also available in: Atom PDF