Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-58211

Deduplicate entries returned in a single invocation to `MigrationChunkClonerSourceLegacy::nextModsBatch`

    • Type: Icon: Task Task
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • 5.1.0-rc0
    • Affects Version/s: None
    • Component/s: Sharding
    • Labels:
      None
    • Fully Compatible
    • Sharding EMEA 2021-07-12, Sharding EMEA 2021-07-26

      The MigrationChunkClonerSourceLegacy::nextModsBatch call is at the core of transferring the diffs during chunk migration catch-up. As implemented currently:

      1. During writes we place the {{_id}}s of written documents in a linked list
      2. During transferring of the diffs we go through that list, fetch the entire documents and append them to the out buffer

      Because of the above, if the same document was written 2 or more times, it will be present multiple times in the linked list and as a result we will transfer it multiple times. In the case of small updates ($inc, etc) to the same document, this could result in wasteful transfer of documents.

      In order to optimise this, within a single nextModsBatch invocation we should at least keep track of the {{_id}}s that we add to the buffer and make sure we don't add one twice. In order to make this deduplication as efficient as possible, we should not do BSONObj comparisons, but treat the _id as a bag of bytes.

            Assignee:
            allison.easton@mongodb.com Allison Easton
            Reporter:
            kaloian.manassiev@mongodb.com Kaloian Manassiev
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: