The MigrationChunkClonerSourceLegacy::nextModsBatch call is at the core of transferring the diffs during chunk migration catch-up. As implemented currently:
- During writes we place the {{_id}}s of written documents in a linked list
- During transferring of the diffs we go through that list, fetch the entire documents and append them to the out buffer
Because of the above, if the same document was written 2 or more times, it will be present multiple times in the linked list and as a result we will transfer it multiple times. In the case of small updates ($inc, etc) to the same document, this could result in wasteful transfer of documents.
In order to optimise this, within a single nextModsBatch invocation we should at least keep track of the {{_id}}s that we add to the buffer and make sure we don't add one twice. In order to make this deduplication as efficient as possible, we should not do BSONObj comparisons, but treat the _id as a bag of bytes.
- has to be done after
-
SERVER-58242 Increase the batch size for `_xferMods`
- Closed
- has to be done before
-
SERVER-58241 Overlap fetching of `_xferMods` from the donor with applying them on the recipient
- Closed