Request a more compact event range representation
Mu2e is doing some work that produces art event-data files that are a very sparse skim.
The workflow is to run many jobs in stage 1 and filter the output. Stage 2 aggregates
many files from stage 1 and filters further. And so on. After a few stages, 1 file
contains events from O(25,000) files of stage 1.
The first few stages were run with a very old art ( v1 series ).
When reading the very sparse skims with art v2_07_03 memory consumption of the job explodes.
From conversations with Kyle I understand that the memory is consumed keeping track of many
small event ranges in the input file. We would be happy with just the first and last events in
each sub run, not a detailed list of event ranges from each subrun.
We always process our data so that events from a given subrun are always in one file.
Please improve the record keeping scheme so that the large memory consumption is
#1 Updated by Kyle Knoepfel almost 3 years ago
- Category set to I/O
- Status changed from New to Assigned
- Assignee set to Kyle Knoepfel
- Target version set to 2.09.00
- Estimated time set to 24.00 h
- SSI Package art added
We will provide a non-default option that can be specified to enable the behavior you propose.
#2 Updated by Kyle Knoepfel almost 3 years ago
- % Done changed from 0 to 90
This feature has been implemented with commits:
The following configuration option has been added to the
## If users can guarantee that SubRuns do not span multiple input ## files, the 'compactEventRanges' parameter can be set to 'true' ## to ensure the most compact representation of event-ranges associated ## with all Runs and SubRuns stored in the input file. ## ## WARNING: Enabling compact event ranges creates a history that can ## cause file concatenation problems if a given SubRun spans ## multiple input files. Use with care. compactEventRanges: false # default
The amount of savings that can be achieved depends on the event-list fragmentation of the input file. For the Mu2e file I was given (which was a result of concatenating many input files), the savings in memory is on the order of 100 MB, where the footprint of the compacted range sets is less than 100 KB.
I am in the process of verifying that savings on a normal
mu2e job, at which point I will mark this issue as resolved.
#3 Updated by Kyle Knoepfel almost 3 years ago
- Status changed from Assigned to Resolved
- % Done changed from 90 to 100
I have run the following job using mu2e version 6.3.2:
art -c /mu2e/app/users/whyaqm/STM_study/step02/fcl_test/tmp/cnf.whyaqm.Final-test.v6_2_4.001002_00151903_103277.fcl
which was reported as one of the problematic jobs.
With the adjustments to
art, I achieve the following results
||3635 MB||2876 MB||387 MB|
||2587 MB||1799 MB||16 MB|
RootFileDB object is the persisted database inside of the
art/ROOT file. To summarize, for this particular job, an in-memory savings of better than 1 GB is achieved when setting
true for the