Project

General

Profile

Support #21846

Do nothing job has VmHWM = 3.6 GB

Added by Rob Kutschke 8 months ago. Updated 8 months ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
Start date:
02/05/2019
Due date:
% Done:

0%

Estimated time:
Spent time:
Scope:
Internal
Experiment:
-
SSI Package:
Duration:

Description

I am running a job that makes a sparse skim. When I read it back with a do-nothing art job the VmHWM is 3.6 GB.
I would like to understand why and how to mitigate it.

For the record this is a 6th generation file; most generations aggregated many files from the previous generation.
The first generation of files was produced with art v1_13_01; generations 2...5 were produced with art v2_06_02.
The sixth generation was made with art v2_12_00

The files referenced below are found on the mu2egpvm machines in /mu2e/app/users/kutschke/Development/HWM_issue .

The example 5th generation file gen5.art has 160 events and 100 subruns.

The example 6th generation file gen6.art has 16 events spread over 10,000 subruns.
To make this file I read 100 files from the 5th generation.

I ran a do-nothing fcl file, sizeTest.fcl, on both art files. The input source is configured as;

source : {
module_type : RootInput
compactEventRanges : true
delayedReadSubRunProducts : true
delayedReadRunProducts : true
readParameterSets : false
}

When I run this fcl file on the two art files I get the following values of VmHWM:

gen5.art 360.
gen6.art 3680.

I tried running on just one event from each file and the VmHWM is not reduced.
The value of the VmHWM is consistent with the value obtained from running the job with /usr/bin/time .

Some more background material. I used the Mu2e utility artProductSizes to look at the size-on-disk of
the top level components of each file:

gen5.art
16 MB total size
15 MB RootFileDB

gen6.art
2.6 GB total size
1.6 GB RootFileDB
1.1 GB SubRun

History

#1 Updated by Kyle Knoepfel 8 months ago

  • Status changed from New to Feedback
  • Tracker changed from Bug to Support

The issue is understood.

Although the compactEventRanges parameter has been supported since art 2.09.00 (per Mu2e's request--see issue #17801), based on the configuration history of 'gen6.art', it appears to have never been set to true. Because of this, the framework has had to calculate the event ranges based on the event list inside the file. For fairly sparse files, this can lead to many holes in the set of event ranges, which art uses to properly interpret (sub)run products.

The reason setting compactEventRanges does not alleviate the issue when reading 'gen6.art' is that the non-compact ranges have already been written to the on-disk RootFileDB, and the entire database is read in at once.

Although it may be possible to change the reading of the database, a more pragmatic solution is for mu2e to either (1) run the stage again that produced the 'gen6.art' output, but this time with the compactEventRanges parameter set to 'true', or (2) run a job that simply forwards the contents of 'gen6.art' to 'gen7.art', but with the compactEventRanges parameter set to 'true'. If option (2) is chosen, then users would simply read the 'gen7.art' file instead of the 'gen6.art' one.

The size of the RootFileDB as written to disk is:

File RootFileDB size RootInput configuration
gen6.art 1.54 GiB default
gen7_compact_ranges.art 49.2 MiB compactEventRanges: true
gen7_compact_ranges_no_config.art 648 KiB compactEventRanges: true
readParameterSets: false

As Mu2e guarantees that a SubRun does not span multiple files, setting compactEventRanges to 'true' yields roughly 3.0 GiB of in-memory savings (1.5 GiB x 2, accounting for ROOT's buffering mechanisms) while still retaining all relevant provenance.

Please let us know if further discussion is desired.


An aside: the product_sizes_dumper utility has been provided by art for some time, largely borrowing implementation from mu2e's tool. If there are aspects of the art-provided tool that can be improved, please let us know--otherwise, you will lessen your maintenance burden by using art's instead.

#2 Updated by Kyle Knoepfel 8 months ago

Anything else to follow up on here, Rob? Or shall I close the issue?

#3 Updated by Rob Kutschke 8 months ago

Thanks Kyle - please close this issue.

#4 Updated by Kyle Knoepfel 8 months ago

  • Status changed from Feedback to Closed

Thanks, Rob.



Also available in: Atom PDF