Project

General

Profile

Support #25512

Memory footprint increase in ProtoDUNE reconstruction job since v09_16_00

Added by Tingjun Yang 2 months ago. Updated 8 days ago.

Status:
Assigned
Priority:
Normal
Assignee:
Category:
-
Target version:
-
Start date:
02/12/2021
Due date:
% Done:

0%

Estimated time:
Experiment:
-
Co-Assignees:
Duration:

Description

Dear LArSoft experts,

I noticed the memory footprint for ProtoDUNE data reconstruction job increased from 1.9 GB to 2.4 GB since larsoft v09_16_00:
https://dbweb8.fnal.gov:8443/LarCI/app/ns:dune/build_detail/test_details?build_id=dune_ci/10901&platform=Linux%203.10.0-1160.11.1.el7.x86_64&buildtype=slf7%20e19:prof&phase=ci_tests&test=ci_datareco_regression_test_protoDUNEsp

Could this be related to the new art version 3.06?

Thanks,
Tingjun

CI_datareco.png (46.9 KB) CI_datareco.png Tingjun Yang, 02/12/2021 04:22 PM
mem.db (88 KB) mem.db Vito Di Benedetto, 02/12/2021 04:46 PM
mem_v09_15_00.db (88 KB) mem_v09_15_00.db Vito Di Benedetto, 02/15/2021 12:50 PM

History

#1 Updated by Kyle Knoepfel 2 months ago

  • Status changed from New to Feedback

Hi Tingjun, the first job in that CI test shows (in the stderr link):

[Many lines snipped]

492: ====================================================================================================
493: MemoryTracker summary (base-10 MB units used)
494: 
495:   Peak virtual memory usage (VmPeak)  : 4409.25 MB
496:   Peak resident set size usage (VmHWM): 2448.32 MB
497:   Details saved in: 'mem.db'
498: ====================================================================================================

It would be very helpful if you could provide us with that mem.db file that is generated in the job.

#2 Updated by Vito Di Benedetto 2 months ago

The mem.db is attached. It is not among Jenkins artifacts stored by default, but for the one associated to the last CI build we still have a way to access it.

#3 Updated by Kyle Knoepfel 2 months ago

  • Assignee set to Kyle Knoepfel
  • Status changed from Feedback to Assigned

Thanks, Vito.

#4 Updated by Vito Di Benedetto 2 months ago

I'm, attaching mem_v09_15_00.db, this is the mem.db file generated by this job when running under LArSoft v09_15_00.
The associated CI dashboard with logs for this regression test under v09_15_00 is available here

#5 Updated by Kyle Knoepfel 2 months ago

Vito, this is helpful, thanks. Based on the DB files you provided, I can verify that the workflow above uses ~500MB more memory before the first module is even constructed. This probably implies that one of the services is taking a significant amount more memory upon construction. I will have to do some profiling to figure it out.

#6 Updated by Tingjun Yang 2 months ago

Thanks Kyle for the investigation. I have added David and Tom to the watch list.

#7 Updated by Thomas Junk 2 months ago

I am also running massif with that CI test side by side with v09_15_00 and v09_16_00.

#8 Updated by Thomas Junk 2 months ago

Weird! I am not sure how to interpret this. Massif reports only 1.5 GB heap size, but MemoryTracker shows much more usage. Could be something big is going on the stack?

  v09_15_00

  MemoryTracker summary (base-10 MB units used)

  Peak virtual memory usage (VmPeak)  : 4732.01 MB
  Peak resident set size usage (VmHWM): 3445.19 MB
  Details saved in: 'mem.db'

from Massif:

    GB
1.503^             ##                                                         
     |             #                                                          
     |             #                                                          
     |          :  #                                                         :
     |          :::#                                                         @
     |         @:: #                                                         @
     |         @:: #           :::@@@   ::@:::::@@:: ::::::::   ::::::   ::: @
     |         @:: #        :::: :@ @:::: @: : :@ : :: : : ::  :: ::: @:::: :@
     |         @:: # :   :: :: : :@ @:: : @: : :@ : :: : : :::::: ::: @: :: :@
     |         @:: # :  :: ::: : :@ @:: : @: : :@ : :: : : ::: :: ::: @: :: :@
     |        :@:: # :@@:: ::: : :@ @:: : @: : :@ : :: : : ::: :: ::: @: :: :@
     |       ::@:: # :@ :: ::: : :@ @:: : @: : :@ : :: : : ::: :: ::: @: :: :@
     |     ::::@:: # :@ :: ::: : :@ @:: : @: : :@ : :: : : ::: :: ::: @: :: :@
     |     : ::@:: # :@ :: ::: : :@ @:: : @: : :@ : :: : : ::: :: ::: @: :: :@
     |  :::: ::@:: # :@ :: ::: : :@ @:: : @: : :@ : :: : : ::: :: ::: @: :: :@
     |  : :: ::@:: # :@ :: ::: : :@ @:: : @: : :@ : :: : : ::: :: ::: @: :: :@
     |  : :: ::@:: # :@ :: ::: : :@ @:: : @: : :@ : :: : : ::: :: ::: @: :: :@
     | @: :: ::@:: # :@ :: ::: : :@ @:: : @: : :@ : :: : : ::: :: ::: @: :: :@
     | @: :: ::@:: # :@ :: ::: : :@ @:: : @: : :@ : :: : : ::: :: ::: @: :: :@
     | @: :: ::@:: # :@ :: ::: : :@ @:: : @: : :@ : :: : : ::: :: ::: @: :: :@
   0 +----------------------------------------------------------------------->Ti
     0                                                                   1.483

  v09_16_00

  MemoryTracker summary (base-10 MB units used)

  Peak virtual memory usage (VmPeak)  : 5738.24 MB
  Peak resident set size usage (VmHWM): 4171.99 MB
  Details saved in: 'mem.db'

from Massif:

    GB
1.536^             #                                                          
     |            :#                                                          
     |            :#                                                          
     |          :::#                                                          
     |          : :#                                                         :
     |          : :#                                                         :
     |          : :#         :::::::::::::@@:::::::  :::::::::: :@@::::::::: :
     |          : :#         ::: : :: ::::@ ::: : ::::: : :: :  :@ ::: :::: @@
     |          : :#:: :  @  ::: : :: ::::@ ::: : :: :: : :: : ::@ ::: :::: @@
     |         :: :#:  :::@@@::: : :: ::::@ ::: : :: :: : :: : ::@ ::: :::: @@
     |         :: :#: ::: @@ ::: : :: ::::@ ::: : :: :: : :: : ::@ ::: :::: @@
     |     :::::: :#: ::: @@ ::: : :: ::::@ ::: : :: :: : :: : ::@ ::: :::: @@
     |     : : :: :#: ::: @@ ::: : :: ::::@ ::: : :: :: : :: : ::@ ::: :::: @@
     |   ::: : :: :#: ::: @@ ::: : :: ::::@ ::: : :: :: : :: : ::@ ::: :::: @@
     |   ::: : :: :#: ::: @@ ::: : :: ::::@ ::: : :: :: : :: : ::@ ::: :::: @@
     |  :::: : :: :#: ::: @@ ::: : :: ::::@ ::: : :: :: : :: : ::@ ::: :::: @@
     |  :::: : :: :#: ::: @@ ::: : :: ::::@ ::: : :: :: : :: : ::@ ::: :::: @@
     |  :::: : :: :#: ::: @@ ::: : :: ::::@ ::: : :: :: : :: : ::@ ::: :::: @@
     |  :::: : :: :#: ::: @@ ::: : :: ::::@ ::: : :: :: : :: : ::@ ::: :::: @@
     |  :::: : :: :#: ::: @@ ::: : :: ::::@ ::: : :: :: : :: : ::@ ::: :::: @@
   0 +----------------------------------------------------------------------->Ti
     0                                                                   1.476

#9 Updated by Thomas Junk 2 months ago

Running massif with --stacks=yes lists the stack sizes as very small (tens of kilobytes). Running the job interactively with v09_16_00 without valgrind gives:

MemoryTracker summary (base-10 MB units used)

Peak virtual memory usage (VmPeak)  : 3597.85 MB
Peak resident set size usage (VmHWM): 2217.58 MB

so a lot of the sizes reported above are valgrind's own memory.

#10 Updated by Thomas Junk 2 months ago

I ran this to get massif to do full memory profiling:

valgrind --tool=massif --pages-as-heap=yes lar -n 1 -c ci_test_datareco_protoDUNEsp.fcl xroot://fndca1.fnal.gov:1095/pnfs/fnal.gov/usr/dune/persistent/stash/ContinuousIntegration/protoDUNEsp/datareco/np04_raw_run005809_0004_dl6.root 

 v09_15_00 

    GB
3.208^                      #                                                 
     |                      #                                                 
     |                      #                                                 
     |                      #:@@:::::::::::::::::::::@::::@::::::@::@:@::@::::
     |           ::::::@::::#:@ :: :: :: ::: :::: :::@::::@::::::@::@:@::@   :
     |           : ::: @    #:@ :: :: :: ::: :::: :::@::::@::::::@::@:@::@   :
     |           : ::: @    #:@ :: :: :: ::: :::: :::@::::@::::::@::@:@::@   :
     |         @:: ::: @    #:@ :: :: :: ::: :::: :::@::::@::::::@::@:@::@   :
     |     @@::@ : ::: @    #:@ :: :: :: ::: :::: :::@::::@::::::@::@:@::@   :
     |    :@@: @ : ::: @    #:@ :: :: :: ::: :::: :::@::::@::::::@::@:@::@   :
     |    :@@: @ : ::: @    #:@ :: :: :: ::: :::: :::@::::@::::::@::@:@::@   :
     |   ::@@: @ : ::: @    #:@ :: :: :: ::: :::: :::@::::@::::::@::@:@::@   :
     |:::::@@: @ : ::: @    #:@ :: :: :: ::: :::: :::@::::@::::::@::@:@::@   :
     |: :::@@: @ : ::: @    #:@ :: :: :: ::: :::: :::@::::@::::::@::@:@::@   :
     |: :::@@: @ : ::: @    #:@ :: :: :: ::: :::: :::@::::@::::::@::@:@::@   :
     |: :::@@: @ : ::: @    #:@ :: :: :: ::: :::: :::@::::@::::::@::@:@::@   :
     |: :::@@: @ : ::: @    #:@ :: :: :: ::: :::: :::@::::@::::::@::@:@::@   :
     |: :::@@: @ : ::: @    #:@ :: :: :: ::: :::: :::@::::@::::::@::@:@::@   :
     |: :::@@: @ : ::: @    #:@ :: :: :: ::: :::: :::@::::@::::::@::@:@::@   :
     |: :::@@: @ : ::: @    #:@ :: :: :: ::: :::: :::@::::@::::::@::@:@::@   :
   0 +----------------------------------------------------------------------->Ti
     0                                                                   1.568

Number of snapshots: 92

peak snapshot has 3.444 GB total ("pages as heap" includes code and stack)

 v09_16_00 

    GB
3.600^                      #                                                 
     |                      #                                                 
     |                      #                                                @
     |           ::::       #::::::::::::::::::::::::::::::::::::::::::::::::@
     |           : ::@@:::::#:: ::: ::: : :: :::: :: : ::: :: :: ::::: ::::  @
     |           : ::@ :    #:: ::: ::: : :: :::: :: : ::: :: :: ::::: ::::  @
     |           : ::@ :    #:: ::: ::: : :: :::: :: : ::: :: :: ::::: ::::  @
     |     @ @:::: ::@ :    #:: ::: ::: : :: :::: :: : ::: :: :: ::::: ::::  @
     |    :@:@:: : ::@ :    #:: ::: ::: : :: :::: :: : ::: :: :: ::::: ::::  @
     |    :@:@:: : ::@ :    #:: ::: ::: : :: :::: :: : ::: :: :: ::::: ::::  @
     |   ::@:@:: : ::@ :    #:: ::: ::: : :: :::: :: : ::: :: :: ::::: ::::  @
     |:::::@:@:: : ::@ :    #:: ::: ::: : :: :::: :: : ::: :: :: ::::: ::::  @
     |: :::@:@:: : ::@ :    #:: ::: ::: : :: :::: :: : ::: :: :: ::::: ::::  @
     |: :::@:@:: : ::@ :    #:: ::: ::: : :: :::: :: : ::: :: :: ::::: ::::  @
     |: :::@:@:: : ::@ :    #:: ::: ::: : :: :::: :: : ::: :: :: ::::: ::::  @
     |: :::@:@:: : ::@ :    #:: ::: ::: : :: :::: :: : ::: :: :: ::::: ::::  @
     |: :::@:@:: : ::@ :    #:: ::: ::: : :: :::: :: : ::: :: :: ::::: ::::  @
     |: :::@:@:: : ::@ :    #:: ::: ::: : :: :::: :: : ::: :: :: ::::: ::::  @
     |: :::@:@:: : ::@ :    #:: ::: ::: : :: :::: :: : ::: :: :: ::::: ::::  @
     |: :::@:@:: : ::@ :    #:: ::: ::: : :: :::: :: : ::: :: :: ::::: ::::  @
   0 +----------------------------------------------------------------------->Ti
     0                                                                   1.564

Number of snapshots: 58

peak snapshot has 3.865 GB total ("pages as heap")

At the peak snapshot, I looked for some differences that showed up early in the job.  I quote the
peak sample below as the allocations may not have been completed during early snapshots, or the
snapshots taken at slightly different stages of initializtion.

-------------------------
ROOT's use of cling and clang seems to have taken about 120 MB more space:

v09_16_00

| ->06.25% (241,664,000B) 0x3DD207C1: llvm::sys::fs::mapped_file_region::mapped_file_region(int, llvm::sys::fs::mapped_file_region::mapmode, unsigned long, unsigned long, std::error_code&) (in /cvmfs/larsoft.opensciencegrid.org/products/root/v6_22_06a/Linux64bit+3.10-2.17-e19-p383b-prof/lib/libCling.so)

which is called by

clang::CompilerInstance::loadModule

in v09_15_00, it's only

| ->03.49% (120,057,856B) 0x2EFC2E38: llvm::sys::fs::mapped_file_region::mapped_file_region(int, llvm::sys::fs::mapped_file_region::mapmode, unsigned long, unsigned long, std::error_code&) (in /cvmfs/larsoft.opensciencegrid.org/products/root/v6_18_04d/Linux64bit+3.10-2.17-e19-prof/lib/libCling.so)

-------------------------

Loading shared libraries takes another 90 MB extra

v09_16_00

| ->29.79% (1,151,578,112B) 0x40064B6: _dl_map_object_from_fd (in /usr/lib64/ld-2.17.so)

which is called from

 cet::LibraryManager::getSymbolByPath_

while in v09_15_00, it's only

| ->30.79% (1,060,454,400B) 0x40064B5: _dl_map_object_from_fd (in /usr/lib64/ld-2.17.so)

--------------------------

ToolBased Raw Digit Prep takes another 34 MB

v09_16_00

   2832:| |   | | |                   ->09.56% (339,738,624B) 0x1D73C3ED: ToolBasedRawDigitPrepService::prepare(detinfo::DetectorClocksData const&, std::map<unsigned int, AdcChannelData, std::less<unsigned int>, std::allocator<std::pair<unsigned int const, AdcChannelData> > >&, std::vector<recob::Wire, std::allocator<recob::Wire> >*, WiredAdcChannelDataMap*) const (ToolBasedRawDigitPrepService_service.cc:206)

v09_15_00

6352:| |   | | |   ->10.15% (306,184,192B) 0x1CC14BBC: ToolBasedRawDigitPrepService::prepare(detinfo::DetectorClocksData const&, std::map<unsigned int, AdcChannelData, std::less<unsigned int>, std::allocator<std::pair<unsigned int const, AdcChannelData> > >&, std::vector<recob::Wire, std::allocator<recob::Wire> >*, WiredAdcChannelDataMap*) const (ToolBasedRawDigitPrepService_service.cc:206)

--------------------------

There are calls to new from jsoncpp that are also a bit bigger now.  Here are the first ones in the snapshots, for
about 33 MB.

v09_16_00:

   52:| |   | | ->04.88% (124,780,544B) 0x61EA19DB: __gnu_cxx::new_allocator<std::_Rb_tree_node<std::pair<Json::Value::CZString const, Json::Value> > >::allocate(unsigned long, void const*) (in /cvmfs/larsoft.opensciencegrid.org/products/jsoncpp/v1_7_7e/Linux64bit+3.10-2.17-e19-prof/lib/libjsoncpp.so.1.7.7)

v09_15_00:

  191:| |   | | ->04.33% (91,226,112B) 0x46B6A9DA: __gnu_cxx::new_allocator<std::_Rb_tree_node<std::pair<Json::Value::CZString const, Json::Value> > >::allocate(unsigned long, void const*) (in /cvmfs/larsoft.opensciencegrid.org/products/jsoncpp/v1_7_7d/Linux64bit+3.10-2.17-e19-prof/lib/libjsoncpp.so.1.7.7)

------------------------------

That's 277 out of the 400 MB or so difference, so there's more stuff
to root out.

/dune/data/users/trj/proftest/v915_thirdtry/msprint915.txt

and

/dune/data/users/trj/proftest/v916_thirdtry/msprint916.txt

#11 Updated by Thomas Junk 2 months ago

 I found  another one: libopenblas is taking 168 MB on load of its dynamic library.  

| ->06.57% (167,772,160B) 0x219335F3: alloc_mmap (in /cvmfs/larsoft.opensciencegrid.org/products/openblas/v0_3_10a/Linux64bit+3.10-2.17-e19/lib/libopenblas.so.0.3)
| | ->06.57% (167,772,160B) 0x21933B51: blas_memory_alloc (in /cvmfs/larsoft.opensciencegrid.org/products/openblas/v0_3_10a/Linux64bit+3.10-2.17-e19/lib/libopenblas.so.0.3)
| |   ->05.25% (134,217,728B) 0x21934200: blas_thread_init (in /cvmfs/larsoft.opensciencegrid.org/products/openblas/v0_3_10a/Linux64bit+3.10-2.17-e19/lib/libopenblas.so.0.3)
| |   | ->05.25% (134,217,728B) 0x21710190: gotoblas_init (in /cvmfs/larsoft.opensciencegrid.org/products/openblas/v0_3_10a/Linux64bit+3.10-2.17-e19/lib/libopenblas.so.0.3)
| |   |   ->05.25% (134,217,728B) 0x400F9C2: _dl_init (in /usr/lib64/ld-2.17.so)
| |   |     ->05.25% (134,217,728B) 0x401459D: dl_open_worker (in /usr/lib64/ld-2.17.so)
| |   |       ->05.25% (134,217,728B) 0x400F7D3: _dl_catch_error (in /usr/lib64/ld-2.17.so)
| |   |         ->05.25% (134,217,728B) 0x4013B8A: _dl_open (in /usr/lib64/ld-2.17.so)
| |   |           ->05.25% (134,217,728B) 0x9389FAA: dlopen_doit (in /usr/lib64/libdl-2.17.so)
| |   |             ->05.25% (134,217,728B) 0x400F7D3: _dl_catch_error (in /usr/lib64/ld-2.17.so)
| |   |               ->05.25% (134,217,728B) 0x938A5AC: _dlerror_run (in /usr/lib64/libdl-2.17.so)
| |   |                 ->05.25% (134,217,728B) 0x938A040: dlopen@@GLIBC_2.2.5 (in /usr/lib64/libdl-2.17.so)
| |   |                   ->05.25% (134,217,728B) 0x786855A: cet::LibraryManager::get_lib_ptr(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const (LibraryManager.cc:209)

openblas is not mentioned in the v09_15_00 valgrind output.

#12 Updated by Kyle Knoepfel 2 months ago

Tom, this is a heroic amount of effort you've put into this. I'd like to discuss your findings with the SciSoft team on Monday.

#13 Updated by Thomas Junk 2 months ago

By all means, go ahead!

#14 Updated by Tingjun Yang 8 days ago

Hi Kyle,

I am curious about the status of this issue. The increase in memory footprint is actually having an impact on the ProtoDUNE production since more slots need to be requested to accommodate the memory usage.

Thanks,
Tingjun

Also available in: Atom PDF