[SERVER-48067] Reduce memory consumption for unique index builds with large numbers of non-unique keys Created: 08/May/20  Updated: 29/Oct/23  Resolved: 20/Jul/20

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 4.2.0
Fix Version/s: 4.2.9, 4.7.0, 4.4.2

Type: Improvement Priority: Major - P3
Reporter: Louis Williams Assignee: Gregory Noma
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Problem/Incident
Related
Backwards Compatibility: Fully Compatible
Backport Requested:
v4.4, v4.2
Sprint: Execution Team 2020-07-27
Participants:
Case:
Linked BF Score: 0

 Description   

We use a vector to track all duplicate keys inserted on an index. Specifically, this is in the phase when we dump keys from the external sorter into the WT bulk inserter.

Due to the nature of hybrid index builds, we must track these duplicates until we temporarily stop writes and can see all writes to the table.

If a collection has a large number of duplicate key violations, this vector can build up without bound. We can improve this behavior by batching writes to reduce the memory impact.

We should consider using this ticket to also address the conversion of KeyString back to BSONObj to record duplicates and also the memory amplification of creating new vectors to copy key data.



 Comments   
Comment by Githook User [ 08/Sep/20 ]

Author:

{'name': 'Gregory Noma', 'email': 'gregory.noma@gmail.com', 'username': 'gregorynoma'}

Message: SERVER-48067 Reduce memory consumption for unique index builds with large numbers of non-unique keys

(cherry picked from commit 3c2951938d65667d675c48511d9d1046655809a5)
Branch: v4.4
https://github.com/mongodb/mongo/commit/f2a4792e6bea5d403dcf3474698312c7178ece86

Comment by Githook User [ 21/Jul/20 ]

Author:

{'name': 'Gregory Noma', 'email': 'gregory.noma@gmail.com', 'username': 'gregorynoma'}

Message: SERVER-48067 Reduce memory consumption for unique index builds with large numbers of non-unique keys

(cherry picked from commit 3c2951938d65667d675c48511d9d1046655809a5)
Branch: v4.2
https://github.com/mongodb/mongo/commit/98a7db0a562b818426c19203e8d16cc93980b279

Comment by Githook User [ 20/Jul/20 ]

Author:

{'name': 'Gregory Noma', 'email': 'gregory.noma@gmail.com', 'username': 'gregorynoma'}

Message: SERVER-48067 Reduce memory consumption for unique index builds with large numbers of non-unique keys
Branch: master
https://github.com/mongodb/mongo/commit/3c2951938d65667d675c48511d9d1046655809a5

Comment by Asya Kamsky [ 09/Jul/20 ]

If there are duplicates in a unique index, wouldn't it be more efficient to fail the moment we detect the first duplicate?

Generated at Thu Feb 08 05:16:02 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.