[SERVER-48067] Reduce memory consumption for unique index builds with large numbers of non-unique keys Created: 08/May/20 Updated: 29/Oct/23 Resolved: 20/Jul/20 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | 4.2.0 |
| Fix Version/s: | 4.2.9, 4.7.0, 4.4.2 |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Louis Williams | Assignee: | Gregory Noma |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||
| Backport Requested: |
v4.4, v4.2
|
||||||||||||
| Sprint: | Execution Team 2020-07-27 | ||||||||||||
| Participants: | |||||||||||||
| Case: | (copied to CRM) | ||||||||||||
| Linked BF Score: | 0 | ||||||||||||
| Description |
|
We use a vector to track all duplicate keys inserted on an index. Specifically, this is in the phase when we dump keys from the external sorter into the WT bulk inserter. Due to the nature of hybrid index builds, we must track these duplicates until we temporarily stop writes and can see all writes to the table. If a collection has a large number of duplicate key violations, this vector can build up without bound. We can improve this behavior by batching writes to reduce the memory impact. We should consider using this ticket to also address the conversion of KeyString back to BSONObj to record duplicates and also the memory amplification of creating new vectors to copy key data. |
| Comments |
| Comment by Githook User [ 08/Sep/20 ] |
|
Author: {'name': 'Gregory Noma', 'email': 'gregory.noma@gmail.com', 'username': 'gregorynoma'}Message: (cherry picked from commit 3c2951938d65667d675c48511d9d1046655809a5) |
| Comment by Githook User [ 21/Jul/20 ] |
|
Author: {'name': 'Gregory Noma', 'email': 'gregory.noma@gmail.com', 'username': 'gregorynoma'}Message: (cherry picked from commit 3c2951938d65667d675c48511d9d1046655809a5) |
| Comment by Githook User [ 20/Jul/20 ] |
|
Author: {'name': 'Gregory Noma', 'email': 'gregory.noma@gmail.com', 'username': 'gregorynoma'}Message: |
| Comment by Asya Kamsky [ 09/Jul/20 ] |
|
If there are duplicates in a unique index, wouldn't it be more efficient to fail the moment we detect the first duplicate? |