Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-19934

Ill-timed crash at end of chunk migration can lead to lost writes when using replica sets as config servers

    • Type: Icon: Bug Bug
    • Resolution: Done
    • Priority: Icon: Major - P3 Major - P3
    • 3.2.0-rc0
    • Affects Version/s: None
    • Component/s: Sharding
    • Labels:
    • Fully Compatible
    • ALL
    • Sharding 8 08/28/15, Sharding 9 (09/18/15), Sharding A (10/09/15)

      This bug only affects config server replica set configurations, and cannot occur in 3.0, 2.6 or 2.4 series clusters.

      In a sharded cluster with CSRS config servers that is moving some chunk, C from a donor shard to a recipient shard,

      If the donor shard replica set primary node (or standalone node) crashes during the chunk migration critical section after writing the chunk metadata changes to the config server,

      And some mongos that is not aware of the change to the chunk metadata tries to route a write for the donated chunk to the donor shard,

      And the new donor replica set primary node (or restarted standalone node) contacts a lagged CSRS secondary that has stale chunk information,

      Then the new donor node will accept the write even though it does not own the chunk, leading to a lost write.

      The problem is that the donor replica set does not remember that it is finishing a chunk migration across failovers and restarts, and also does not durably remember the minimum config server optime corresponding to its most recently completed metadata operation.

        1. test.diff
          3 kB
        2. test.js
          4 kB

            kaloian.manassiev@mongodb.com Kaloian Manassiev
            schwerin@mongodb.com Andy Schwerin
            0 Vote for this issue
            7 Start watching this issue