[SERVER-58211] Deduplicate entries returned in a single invocation to `MigrationChunkClonerSourceLegacy::nextModsBatch` Created: 02/Jul/21  Updated: 29/Oct/23  Resolved: 26/Jul/21

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: 5.1.0-rc0

Type: Task Priority: Major - P3
Reporter: Kaloian Manassiev Assignee: Allison Easton
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Gantt Dependency
has to be done before SERVER-58241 Overlap fetching of `_xferMods` from ... Closed
has to be done after SERVER-58242 Increase the batch size for `_xferMods` Closed
Backwards Compatibility: Fully Compatible
Sprint: Sharding EMEA 2021-07-12, Sharding EMEA 2021-07-26
Participants:

 Description   

The MigrationChunkClonerSourceLegacy::nextModsBatch call is at the core of transferring the diffs during chunk migration catch-up. As implemented currently:

  1. During writes we place the {{_id}}s of written documents in a linked list
  2. During transferring of the diffs we go through that list, fetch the entire documents and append them to the out buffer

Because of the above, if the same document was written 2 or more times, it will be present multiple times in the linked list and as a result we will transfer it multiple times. In the case of small updates ($inc, etc) to the same document, this could result in wasteful transfer of documents.

In order to optimise this, within a single nextModsBatch invocation we should at least keep track of the {{_id}}s that we add to the buffer and make sure we don't add one twice. In order to make this deduplication as efficient as possible, we should not do BSONObj comparisons, but treat the _id as a bag of bytes.



 Comments   
Comment by Vivian Ge (Inactive) [ 06/Oct/21 ]

Updating the fixversion since branching activities occurred yesterday. This ticket will be in rc0 when it’s been triggered. For more active release information, please keep an eye on #server-release. Thank you!

Comment by Githook User [ 26/Jul/21 ]

Author:

{'name': 'Allison Easton', 'email': 'allison.easton@mongodb.com'}

Message: SERVER-58211 Deduplicate entries returned in a single invocation to `MigrationChunkClonerSourceLegacy::nextModsBatch`
Branch: master
https://github.com/mongodb/mongo/commit/03de046174e7f3ced4fc099ccc4e1a568c414654

Generated at Thu Feb 08 05:43:55 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.