[SERVER-68125] Index build on multi-key fields can consume more memory than limit Created: 19/Jul/22  Updated: 24/Jan/24  Resolved: 27/Oct/22

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 5.0.9, 6.0.0-rc5
Fix Version/s: 6.0.4, 6.2.0-rc0

Type: Bug Priority: Major - P3
Reporter: Louis Williams Assignee: Louis Williams
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File Screenshot 2022-08-12 at 12.05.55.png    
Issue Links:
Backports
Problem/Incident
causes SERVER-83145 Shared buffer fragment incorrectly tr... Closed
Related
is related to SERVER-68982 Key generation on single document can... Backlog
is related to SERVER-82037 Memory used by sorter spills can grow... Backlog
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v6.0, v5.0
Sprint: Execution Team 2022-08-08, Execution Team 2022-08-22, Execution Team 2022-09-05, Execution Team 2022-09-19, Execution Team 2022-10-03, Execution Team 2022-10-17, Execution Team 2022-10-31
Participants:

 Description   

This bug describes a problem when indexing documents that generate multiple keys with many duplicate values. We don't count these duplicate keys towards the memory we are using, which can result in using significantly more memory than intended.



 Comments   
Comment by Githook User [ 25/Nov/22 ]

Author:

{'name': 'Louis Williams', 'email': 'louis.williams@mongodb.com', 'username': 'louiswilliams'}

Message: SERVER-68125 Correctly track all memory used during index builds

This allows the Sorter to manage a memory pool that can be used to allocate memory for index builds. Previously, we relied on each key to report its individual memory usage, but there are some cases where we fail to represent the actual memory used by all keys because the memory was backed by a shared buffer. This new memory pool holds references to all of the buffers it allocates and does not free them until the caller requests so, in this case when the sorter spills to disk.

This strategy keeps the performance the same for index builds without many duplicate keys (due to repeated array values). In the degenerate case where we're building an index with very large duplicate keys, we will end up with more spilling than we had before, at the cost of correctly tracking our memory usage.
Branch: v6.0
https://github.com/mongodb/mongo/commit/7090cf539845798a84ee4ac6488faf66783826c2

Comment by Githook User [ 26/Oct/22 ]

Author:

{'name': 'Louis Williams', 'email': 'louis.williams@mongodb.com', 'username': 'louiswilliams'}

Message: SERVER-68125 Correctly track all memory used during index builds

This allows the Sorter to manage a memory pool that can be used to allocate memory for index builds. Previously, we relied on each key to report its individual memory usage, but there are some cases where we fail to represent the actual memory used by all keys because the memory was backed by a shared buffer. This new memory pool holds references to all of the buffers it allocates and does not free them until the caller requests so, in this case when the sorter spills to disk.

This strategy keeps the performance the same for index builds without many duplicate keys (due to repeated array values). In the degenerate case where we're building an index with very large duplicate keys, we will end up with more spilling than we had before, at the cost of correctly tracking our memory usage.
Branch: master
https://github.com/mongodb/mongo/commit/ad1192b4f6fb77d0074227c375e8e83441654f7a

Comment by Yujin Kang Park [ 29/Jul/22 ]

From the above reproducer, we have traced it down to btree_key_generator.cpp.

Since this is a multikey index:

  • For each document, the generator allocates 1000 KeyStrings (~10 bytes each)
  • All KeyStrings are equal (the same array value), in the end the KeyStringSet reduces down to 1 unique element, which is added to the sorter.
  • Maximum SharedBufferFragmentBuilder size by default is 2MB, so each KeyString in the sorter is most probably pinning (allocated in same fragment, but actually unused) 10000 bytes allocation while only accounting for ~40bytes (10 bytes for KeyString itself and 32 for KeyString::Value).

The reproducer inserts 250000 documents, each with a field with 1000 elements.
250000 Documents * 1000 KeyStrings * 10 bytes = 2384.19 MiB

Generated at Thu Feb 08 06:09:57 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.