Support #12952
Help NOvA investigate memory usage issues
Description
One of the NOvA analyses is seeing a blow up of memory usage when looping over data. Information that should take up about 400 bytes per object on disk seems to blow up to 2 MB per object when read into memory. I suspect part of the issue is the use of std::shared_ptrs. I already know where in the code the blow up occurs, but could use some critical eyes on the code to help figure out where we can make improvements.
Would the artists be able to help us out?
History
#1 Updated by Kyle Knoepfel over 4 years ago
- Status changed from New to Feedback
We propose meeting to understand what steps have been taken thus far and what environment-setup we would need to reproduce the issue.
#2 Updated by Brian Rebel over 4 years ago
Sounds like a good plan. Can we meet Monday afternoon (today?), say 3 pm?
#3 Updated by Marc Paterno over 4 years ago
Brian, please come by WH 9SW Monday afternoon at 3PM.
#4 Updated by Brian Rebel over 4 years ago
See you then.
#5 Updated by Brian Rebel over 4 years ago
We have already run massif on a job exhibiting this behavior. You can find the massif output file at
/nova/app/users/brebel/nova_testing/fnex_test/massif.out.fnex_flat_map
which is accessible from any of the novagpvm nodes.
#6 Updated by Brian Rebel over 4 years ago
At the meeting we got several suggestions for how to improve the memory usage. In order of least impact on the code to most, they were:
1. Change the flat_map<std::string, fnex::Var> to flat_map<unsigned char, fnex::Var> in FNEX/utilities/VarVals.h
2. Remove the virtual functions from FNEX/utilities/Var.h as the vtable is very large compared to the data being stored
3. Change the use of shared_ptr to unique_ptr to avoid carrying around the information about the number of instances
I have implemented the first already and say about a 20% reduction in the memory, going from 1.66GB used on my MacBookPro to 1.36GB.
We need to include Jeremy Wolcott in the conversation for the next 2. He is available Tuesday or Wednesday for a conversation by phone and is then gone for a couple of weeks on vacation and conference travel. I suggest Wednesday @ 10:30 am.
#7 Updated by Brian Rebel over 4 years ago
- File massif.out.fnex_unsigned_char massif.out.fnex_unsigned_char added
- File massif.out.fnex_flat_map massif.out.fnex_flat_map added
I have run massif with the latest version of the code (changing the flat_map key from std::string to unsigned char) and the output file is attached to this issue, massif.out.fnex_unsigned_char. The total usage drops from 1.96 GB using flat_maps keyed by std::strings (massif.out.fnex_flat_map).
#8 Updated by Kyle Knoepfel over 4 years ago
Thank you for the massif
outputs. Indeed, the boost::container::flat_map
no longer appears in the output, which is a clear improvement. To assess the remaining memory usage, consider the element type of the boost::flat_map
, which for this particular case is std::pair<unsigned char, std::shared_ptr<IndepVar>>
. The breakdown of the memory occupied by each element is shown in this diagram:
16 [from 12 = 8 (virtual table pointer) + 4 (float)] │ std::pair<unsigned char, std::shared_ptr<Var>> │ │ 8 [from 1] ├──── pointer to Var: 8 │ └──── reference counter (and associated pointer): 16
In tabular form:
- 8 bytes:
unsigned char
(should be only 1 byte, but 7 bytes of padding are introduced for alignment reasons) - 24 bytes:
std::shared_ptr<Var>
corresponding to the reference counting and the pointer toVar
- 16 bytes: the
IndepVar
object allocated on the heap. A virtual table pointer introduces 8 bytes, the datum is 4 bytes, and an extra 4 bytes of padding are introduced, again, for alignment. - 48 bytes total
Succinctly put, that is 48 bytes, or 44 bytes overhead, for accessing a float
, which is 4 bytes. Since there are 20 such elements per event, that results in 960 bytes per event, of which only 80 bytes contain data.
- Moving from
shared_ptr
tounique_ptr
: Ashared_ptr
introduces 16 bytes of overhead for reference counting. Changing to aunique_ptr
, if possible, will bring the element size to 32 bytes instead of 48. If it is deemed to be not possible to make this switch, make sure thatstd::make_shared
is used instead of using a directstd::shared_ptr
; using the free function performs one allocation instead of two.
- Removing run-time polymorphism of
IndepVar
: Since virtual functions are present/overridden in the inheritance chain ofIndepVar
, a pointer to a virtual table is stored for each instance ofIndepVar
. Removing the polymorphism (e.g. through use of templates) will reduce the memory footprint of thestd::pair
to 40 bytes. Doing so would then possibly eliminate the need for using anstd::shared_ptr
at all, allowing just theIndepVar
type to be stored in the pair. If retaining the interface through run-time polymorphism is required, consider containing avector<float
> member in a class template that inherits from an interface base class:
This would remove most of the overhead but still retain some degree of polymorphism.class VarCollection { public: virtual std::vector<float> const& vars() const = 0; }; template <typename T> class SpecificCollection : public VarCollection { public: ... private: std::vector<float> const& vars() const override {...} std::vector<float> vars_; };
- Using a 20-member
struct
: The most compact representation of storing twentyfloat
s with different names is to just store twenty separatefloat
s in astruct
. While maybe not easily usable in other circumstances, such astruct
would not result in padding of any kind (due to a well-aligned structure), and the use of pointers (8 bytes each) would be unnecessary. Instead of an object that takes 960 bytes of memory, it would take the minimum 80 bytes of memory necessary to encode the same information.
We are happy to have further discussion with you if desired.
#9 Updated by Kyle Knoepfel over 4 years ago
Can you give us an indication of where things stand with this issue? If you do not need further interaction with us at this time, we would like to close this issue. Thanks.
#10 Updated by Brian Rebel over 4 years ago
Yes, please go ahead and close the ticket. We will open a new one should we need more assistance.
Thanks for all your help.
#11 Updated by Kyle Knoepfel over 4 years ago
- Status changed from Feedback to Closed
- % Done changed from 0 to 100
Thanks, Brian.