Project

General

Profile

Bug #5844

Memory usage of do-nothing NOvA job

Added by Christopher Backhouse over 5 years ago. Updated over 5 years ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Category:
-
Start date:
04/07/2014
Due date:
% Done:

0%

Estimated time:
Duration:

Description

I first noticed this issue way back in https://cdcvs.fnal.gov/redmine/issues/2246
There's a bit of discussion there that helped me understand the problem a bit better, but I'm not sure we really improved matters.

Back then, the memory usage of my do-nothing job was 519MB. Now it's 1268MB. That's a significant regression, and it eats into the headroom we have to do anything actually useful when running on the grid.

I removed $SRT_PUBLIC_CONTEXT/lib from my $LD_LIBRARY_PATH and the result went down to 403MB.

So it sounds like one (or a combination of several) of our modules are wasting ~800MB just by being loaded. If I understand the Issue right, this has to be dicts, as modules proper won't be loaded until required. The contribution from the rest of the job (art+externals) hasn't changed much in the intervening years.

I suppose we could try and figure out which dictionaries, if any, are particularly to blame by copying them into a new directory on LD_LIBRARY_PATH and removing them one by one. It would certainly be great to figure out how to reclaim ~800MB of space per job.

Trying to remove other libraries, removing artdaq sends the usage to 376MB, so apparently that is to blame for about 27MB of wastage. Nothing else had any real effect. I was left with a list of libraries that I was unable to remove without my test ceasing to function: nutools, tbb, sqlite, clhep, root, gcc, boost, cetlib, fhciclcpp, messagefacility, art. The remaining usage must come from some combination of these or in memory actually allocated by the art process.

History

#1 Updated by Christopher Backhouse over 5 years ago

  • Status changed from New to Closed

Various work happened on this, sadly documented in emails rather than here. Baseline usage is somewhat lower now.



Also available in: Atom PDF