Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-38472

A config server can return early for a shardCollection command even if the shard hasn't finished its own shardCollection command

    • Fully Compatible
    • ALL
    • v4.0
    • Sharding 2018-12-31, Sharding 2019-01-14
    • 71

      1. A config server calls shardCollection
      2. The shard begins shardCollection
      3. The shard writes chunks and the metadata entry for the collection.
      4. The config server steps down, cancelling its shardCollection command.
      5. A new config server steps up.
      6. The new config server retries the shardCollection command.
      7. The new config server sees that the metadata entry for the collection has been written, erroneously assuming that the existence of a metadata entry implies that the shard has finished its shardCollection command. This in turn causes the distributed lock to be released, meaning chunk migrations and splits can get in.
      8. A subsequent moveChunk operation can acquire the collection dist lock and because of this can attempt acquiring the critical section, which currently crashes the server.

      A config server should not be able to early return if the shard's shardCollection command is not complete.

            Assignee:
            janna.golden@mongodb.com Janna Golden
            Reporter:
            blake.oler@mongodb.com Blake Oler
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: