Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-34893

A source shard may successfully complete a migration without changing its shard version

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.0.0-rc1, 4.1.1
    • Component/s: Sharding
    • Labels:
      None
    • Backwards Compatibility:
      Fully Compatible
    • Operating System:
      ALL
    • Backport Requested:
      v4.0
    • Sprint:
      Sharding 2018-06-04
    • Linked BF Score:
      30

      Description

      Currently, the config server invalidates its cached metadata for a collection after it hears a response to moveChunk from the source shard in a migration. However, the recipient shard in that migration is free to begin a new migration as soon as it completes _recvChunkCommit (called earlier during the critical section). If a request to move a chunk away from the recipient shard comes to the config server before it invalidates its cache, it will send a stale shard version to the recipient and the recipient may complete the forced refresh at the beginning of a migration without seeing the persisted metadata changes, so it will not know it has received a new chunk, but begin to drive a migration anyway.

      This is only a problem when the shard believes it only owns one chunk, because when it goes to commit the migration it will not send a control chunk, so its shard version will not change after the migration commits, because the version of the chunk it doesn't know it will still own won't be bumped. Then the shard will be able to accept reads from a stale mongos looking for the chunk that was just moved even after refreshing at the end of moveChunk, returning wrong results.

      I think a partial solution would be to have the config server invalidate its cached metadata during _configsvrCommitChunkMigration, to decrease the likelihood of this happening, but a full solution may require preventing the recipient shard in a migration from driving a new one until the migration it was a part of has completed.

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              renctan Randolph Tan
              Reporter:
              jack.mulrow Jack Mulrow
              Participants:
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: