[SERVER-68125] Index build on multi-key fields can consume more memory than limit Created: 19/Jul/22 Updated: 24/Jan/24 Resolved: 27/Oct/22 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | 5.0.9, 6.0.0-rc5 |
| Fix Version/s: | 6.0.4, 6.2.0-rc0 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Louis Williams | Assignee: | Louis Williams |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||||||||||||||||||
| Issue Links: |
|
||||||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||||||
| Backport Requested: |
v6.0, v5.0
|
||||||||||||||||||||||||
| Sprint: | Execution Team 2022-08-08, Execution Team 2022-08-22, Execution Team 2022-09-05, Execution Team 2022-09-19, Execution Team 2022-10-03, Execution Team 2022-10-17, Execution Team 2022-10-31 | ||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||
| Description |
|
This bug describes a problem when indexing documents that generate multiple keys with many duplicate values. We don't count these duplicate keys towards the memory we are using, which can result in using significantly more memory than intended. |
| Comments |
| Comment by Githook User [ 25/Nov/22 ] |
|
Author: {'name': 'Louis Williams', 'email': 'louis.williams@mongodb.com', 'username': 'louiswilliams'}Message: This allows the Sorter to manage a memory pool that can be used to allocate memory for index builds. Previously, we relied on each key to report its individual memory usage, but there are some cases where we fail to represent the actual memory used by all keys because the memory was backed by a shared buffer. This new memory pool holds references to all of the buffers it allocates and does not free them until the caller requests so, in this case when the sorter spills to disk. This strategy keeps the performance the same for index builds without many duplicate keys (due to repeated array values). In the degenerate case where we're building an index with very large duplicate keys, we will end up with more spilling than we had before, at the cost of correctly tracking our memory usage. |
| Comment by Githook User [ 26/Oct/22 ] |
|
Author: {'name': 'Louis Williams', 'email': 'louis.williams@mongodb.com', 'username': 'louiswilliams'}Message: This allows the Sorter to manage a memory pool that can be used to allocate memory for index builds. Previously, we relied on each key to report its individual memory usage, but there are some cases where we fail to represent the actual memory used by all keys because the memory was backed by a shared buffer. This new memory pool holds references to all of the buffers it allocates and does not free them until the caller requests so, in this case when the sorter spills to disk. This strategy keeps the performance the same for index builds without many duplicate keys (due to repeated array values). In the degenerate case where we're building an index with very large duplicate keys, we will end up with more spilling than we had before, at the cost of correctly tracking our memory usage. |
| Comment by Yujin Kang Park [ 29/Jul/22 ] |
|
From the above reproducer, we have traced it down to btree_key_generator.cpp. Since this is a multikey index:
The reproducer inserts 250000 documents, each with a field with 1000 elements. |