Deduplicate entries returned in a single invocation to `MigrationChunkClonerSourceLegacy::nextModsBatch`

XMLWordPrintableJSON

    • Type: Task
    • Resolution: Fixed
    • Priority: Major - P3
    • 5.1.0-rc0
    • Affects Version/s: None
    • Component/s: Sharding
    • None
    • Fully Compatible
    • Sharding EMEA 2021-07-12, Sharding EMEA 2021-07-26
    • None
    • 3
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      The MigrationChunkClonerSourceLegacy::nextModsBatch call is at the core of transferring the diffs during chunk migration catch-up. As implemented currently:

      1. During writes we place the {{_id}}s of written documents in a linked list
      2. During transferring of the diffs we go through that list, fetch the entire documents and append them to the out buffer

      Because of the above, if the same document was written 2 or more times, it will be present multiple times in the linked list and as a result we will transfer it multiple times. In the case of small updates ($inc, etc) to the same document, this could result in wasteful transfer of documents.

      In order to optimise this, within a single nextModsBatch invocation we should at least keep track of the {{_id}}s that we add to the buffer and make sure we don't add one twice. In order to make this deduplication as efficient as possible, we should not do BSONObj comparisons, but treat the _id as a bag of bytes.

            Assignee:
            Allison Easton
            Reporter:
            Kaloian Manassiev
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: