Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-25053

removeShard checks are inherently racy

    • Fully Compatible
    • ALL
    • v4.4
    • Sharding 2020-03-23
    • 26

      removeShard does a series of checks before marking a shard as "draining" (aka to be removed) on the config server, including:

      • only one shard should be "draining" at a time
      • can't remove the last shard
      • the shard to be removed should not already be "draining"

      Relevant code: https://github.com/mongodb/mongo/blob/907ed32a3a8bd19f883836013530f645522a75bc/src/mongo/s/catalog/replset/sharding_catalog_client_impl.cpp#L500-L544

      However, these checks are not guarded by a distributed lock (or even an in-process lock for a single mongos), and so two removeShard requests to either two different mongoses or the same mongos can pass all checks concurrently and remove two shards at once.

      This can be fixed by the new locking mechanism being added for the zone sharding project.

            Assignee:
            alex.taskov@mongodb.com Alexander Taskov (Inactive)
            Reporter:
            esha.maharishi@mongodb.com Esha Maharishi (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: