Project

General

Profile

Feature #17801

Request a more compact event range representation

Added by Rob Kutschke almost 3 years ago. Updated almost 3 years ago.

Status:
Closed
Priority:
High
Assignee:
Category:
I/O
Target version:
Start date:
09/29/2017
Due date:
% Done:

100%

Estimated time:
24.00 h
Spent time:
Scope:
Internal
Experiment:
Mu2e
SSI Package:
art
Duration:

Description

Mu2e is doing some work that produces art event-data files that are a very sparse skim.
The workflow is to run many jobs in stage 1 and filter the output. Stage 2 aggregates
many files from stage 1 and filters further. And so on. After a few stages, 1 file
contains events from O(25,000) files of stage 1.

The first few stages were run with a very old art ( v1 series ).

When reading the very sparse skims with art v2_07_03 memory consumption of the job explodes.
From conversations with Kyle I understand that the memory is consumed keeping track of many
small event ranges in the input file. We would be happy with just the first and last events in
each sub run, not a detailed list of event ranges from each subrun.

We always process our data so that events from a given subrun are always in one file.

Please improve the record keeping scheme so that the large memory consumption is
mitigated.

Rob

History

#1 Updated by Kyle Knoepfel almost 3 years ago

  • Category set to I/O
  • Status changed from New to Assigned
  • Assignee set to Kyle Knoepfel
  • Target version set to 2.09.00
  • Estimated time set to 24.00 h
  • SSI Package art added

We will provide a non-default option that can be specified to enable the behavior you propose.

#2 Updated by Kyle Knoepfel almost 3 years ago

  • % Done changed from 0 to 90

This feature has been implemented with commits:


The following configuration option has been added to the RootInput source:

## If users can guarantee that SubRuns do not span multiple input
## files, the 'compactEventRanges' parameter can be set to 'true'
## to ensure the most compact representation of event-ranges associated
## with all Runs and SubRuns stored in the input file.
##
## WARNING: Enabling compact event ranges creates a history that can
##          cause file concatenation problems if a given SubRun spans
##          multiple input files.  Use with care.

compactEventRanges: false  # default

The amount of savings that can be achieved depends on the event-list fragmentation of the input file. For the Mu2e file I was given (which was a result of concatenating many input files), the savings in memory is on the order of 100 MB, where the footprint of the compacted range sets is less than 100 KB.

I am in the process of verifying that savings on a normal mu2e job, at which point I will mark this issue as resolved.

#3 Updated by Kyle Knoepfel almost 3 years ago

  • Status changed from Assigned to Resolved
  • % Done changed from 90 to 100

I have run the following job using mu2e version 6.3.2:

art -c /mu2e/app/users/whyaqm/STM_study/step02/fcl_test/tmp/cnf.whyaqm.Final-test.v6_2_4.001002_00151903_103277.fcl

which was reported as one of the problematic jobs.

With the adjustments to art, I achieve the following results

compactEventRanges VmPeak VmHWM (Max. RSS) Size of RootFileDB
false (current behavior) 3635 MB 2876 MB 387 MB
true 2587 MB 1799 MB 16 MB

where the RootFileDB object is the persisted database inside of the art/ROOT file. To summarize, for this particular job, an in-memory savings of better than 1 GB is achieved when setting compactEventRanges to true for the RootInput source.

#4 Updated by Kyle Knoepfel almost 3 years ago

  • Status changed from Resolved to Closed


Also available in: Atom PDF