Project

General

Profile

Support #13450

Memory usage in merging art files

Added by Tingjun Yang about 4 years ago. Updated about 4 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
-
Target version:
-
Start date:
08/04/2016
Due date:
% Done:

100%

Estimated time:
Spent time:
Scope:
Experiment:
DUNE
SSI Package:
art
Duration:

Description

Dear Experts,

I have a question about memory usage in merging art files. I have attached fcl file mergeartfiles.fcl.
When I run:

lar -c mergeartfiles.fcl /pnfs/dune/scratch/users/tjyang/v05_09_01/reco/AntiMuonCutEvents_LSU_dune35t/*/AntiMuonCutEvents_LSU_dune35t_*root -n 1000
which merges 10 files, I see the following in the log file:
Modules with large Vsize (Mbytes)               Vsize      Δ Vsize      RSS         Δ RSS
==============================================================================================
end_path:out1:RootOutput
 [902]  run: 10000001 subRun: 48 event: 4702    1073.531        0        302.262        0
 [903]  run: 10000001 subRun: 48 event: 4703    1073.531        0        302.262        0
 [904]  run: 10000001 subRun: 48 event: 4704    1073.531        0        302.262        0
 [905]  run: 10000001 subRun: 48 event: 4705    1073.531        0        302.262        0

====================================================================================================

I think the max memory usage is 1G.

When I run the same command with -n 5000 to merge 50 files, I got the following:

Modules with large Vsize (Mbytes)                Vsize      Δ Vsize      RSS         Δ RSS
===============================================================================================
end_path:out1:RootOutput
 [4902]  run: 10000001 subRun: 20 event: 1902    2278.086        0       1083.062        0
 [4903]  run: 10000001 subRun: 20 event: 1903    2278.086        0       1083.062        0
 [4904]  run: 10000001 subRun: 20 event: 1904    2278.086        0       1083.062        0
 [4905]  run: 10000001 subRun: 20 event: 1905    2278.086        0       1083.062        0

====================================================================================================

The max memory usage is 2.3G.

When I ran the same command with -n 10000 to merge 100 files, I got the following:

===============================================================================================
end_path:out1:RootOutput
 [9902]  run: 10000001 subRun: 99 event: 9802    3768.031        0       2546.383        0
 [9903]  run: 10000001 subRun: 99 event: 9803    3768.031        0       2546.383        0
 [9904]  run: 10000001 subRun: 99 event: 9804    3768.031        0       2546.383        0
 [9905]  run: 10000001 subRun: 99 event: 9805    3768.031        0       2546.383        0

====================================================================================================

The max memory usage is 3.7G.

It seems the memory usage increases with the number of input files. I am a little surprised since I turned fastCloning on. Is this the expected behavior? This has given me trouble on grid.

To reproduce this problem, one can login dunegpvm01 and do the following:

source /grid/fermiapp/products/dune/setup_dune.sh
setup dunetpc v06_01_00 -q e10:prof

and then run the command above with the attached fcl file.

Thanks,
Tingjun

mergeartfiles.fcl (1.06 KB) mergeartfiles.fcl Tingjun Yang, 08/04/2016 06:56 PM

Related issues

Related to LArSoft - Bug #13063: Memory errors and leaks Assigned06/28/2016

History

#1 Updated by Gianluca Petrillo about 4 years ago

  • Related to Bug #13063: Memory errors and leaks added

#2 Updated by Gianluca Petrillo about 4 years ago

This might be related to an observation by Paul Russo.
We have hints that the job eats about 30 MB for each new file.
It was not clear where it came from, although there were hints that this would happen at the first read of the new input file.
Paul did not find evidence yet. What he saw might also be one specific module's fault.

Note that the bug report #13063 does not contain (yet?) the information which Paul Russo based his information on.

#3 Updated by Lynn Garren about 4 years ago

Actually, the information is in #13063 and has been there for some time. You need to read the second entry by Paul.

#4 Updated by Kyle Knoepfel about 4 years ago

The information in #13063 indicates a memory-growth issue with MicroBooNE code, not with art code. What Tingjun shows here is something endemic only to art--i.e. he's only merging files, which requires no experiment code. Lynn, if you have the permissions (I do not), please move this issue to the art redmine project. I suspect this issue may already be resolved by updating to art 2.02.01.

#5 Updated by Lynn Garren about 4 years ago

  • Project changed from LArSoft to art

#6 Updated by Kyle Knoepfel about 4 years ago

  • Status changed from New to Assigned
  • Assignee set to Kyle Knoepfel

#7 Updated by Kyle Knoepfel about 4 years ago

  • Experiment DUNE added
  • Experiment deleted (-)

#8 Updated by Kyle Knoepfel about 4 years ago

  • Description updated (diff)

#9 Updated by Kyle Knoepfel about 4 years ago

  • Status changed from Assigned to Resolved
  • % Done changed from 0 to 100
  • SSI Package art added
  • SSI Package deleted ()

I confirm the memory growth you observe using dunetpc v06_00_01 (art 2.00.03). I have also just confirmed that updating to a larsoft version that uses the latest version of art (2.02.02) will solve the memory growth issue.

#10 Updated by Kyle Knoepfel about 4 years ago

  • Status changed from Resolved to Closed

Also available in: Atom PDF