Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-74705

removeShard should not be allowed for config shard

    • Type: Icon: Bug Bug
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • 7.0.0-rc0
    • Affects Version/s: None
    • Component/s: None
    • Sharding NYC
    • Fully Compatible
    • ALL
    • Sharding NYC 2023-04-03, Sharding NYC 2023-04-17

      removeShard on a config shard succeeds despite existence of the config database on that shard. The config database is still accessible after, getShardMap shows the removed shard but it has been removed from the still accessible config.shards collection

      Update: The behavior seen here may be correct but is definitely confusing/misleading for users. We should ban use of removeShard for a config shard by users and point them to the transitionToDedicatedConfigServer command which is meant for this purpose

       


      Cluster configuration (note that config db is on the "config" shard)

      mongos> sh.status()
      --- Sharding Status --- 
        sharding version: { "_id" : 1, "clusterId" : ObjectId("6408d34d1b1386f4db260a16") }
        shards:
              {  "_id" : "config""host" : "configRepl/localhost:27020""state" : 1,  "topologyTime" : Timestamp(1678299982, 3),  "draining" : true }
              {  "_id" : "jamesRepl""host" : "jamesRepl/localhost:27030""state" : 1,  "topologyTime" : Timestamp(1678299983, 2) }
        active mongoses:
              "7.0.0-alpha-538-g7cec1b7" : 1
        autosplit:
              Currently enabled: yes
        automerge:
              Currently enabled: yes
        balancer:
              Currently enabled: yes
              Currently running: yes
        databases:
              {  "_id" : "config""primary" : "config""partitioned" : true }
                      config.system.sessions
                              shard key: { "_id" : 1 }
                              unique: false
                              balancing: true
                              chunks:
                                      config	696
                                      jamesRepl	328
                              too many chunks to print, use verbose if you want to force print 

      movePrimary of the config database to jamesRepl is disallowed

      mongos> db.adminCommand({movePrimary: "config", to: "jamesRepl"})
      {
      	"ok" : 0,
      	"errmsg" : "Can't move primary for config database",
      	"code" : 72,
      	"codeName" : "InvalidOptions",
      	"$clusterTime" : {
      		"clusterTime" : Timestamp(1678300947, 29),
      		"signature" : {
      			"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
      			"keyId" : NumberLong(0)
      		}
      	},
      	"operationTime" : Timestamp(1678300947, 29)
      } 

      Despite movePrimary of the config db being disallowed, running removeShard on "config" appears to succeed

      mongos> db.adminCommand({removeShard : "config"})
      {
      	"msg" : "draining started successfully",
      	"state" : "started",
      	"shard" : "config",
      	"note" : "you need to drop or movePrimary these databases",
      	"dbsToMove" : [ ],
      	"ok" : 1,
      	"$clusterTime" : {
      		"clusterTime" : Timestamp(1678302328, 3),
      		"signature" : {
      			"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
      			"keyId" : NumberLong(0)
      		}
      	},
      	"operationTime" : Timestamp(1678302328, 3)
      }
      mongos> db.adminCommand({removeShard : "config"})
      {
      	"msg" : "removeshard completed successfully",
      	"state" : "completed",
      	"shard" : "config",
      	"ok" : 1,
      	"$clusterTime" : {
      		"clusterTime" : Timestamp(1678302331, 5),
      		"signature" : {
      			"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
      			"keyId" : NumberLong(0)
      		}
      	},
      	"operationTime" : Timestamp(1678302331, 5)
      }mongos> db.adminCommand({removeShard : "config"})
      {
      	"ok" : 0,
      	"errmsg" : "Shard config does not exist",
      	"code" : 70,
      	"codeName" : "ShardNotFound",
      	"$clusterTime" : {
      		"clusterTime" : Timestamp(1678302405, 2),
      		"signature" : {
      			"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
      			"keyId" : NumberLong(0)
      		}
      	},
      	"operationTime" : Timestamp(1678302405, 2)
      }

      Post removal leaves mongos in an inconsistent state with getShardMap showing the shard as existing, but the entry for that shard has been removed from the config.shards collection (which is still accessible despite the shard with the config database having been removed)

      mongos> db.adminCommand({getShardMap: 1})
      {
      	"map" : {
      		"jamesRepl" : "jamesRepl/localhost:27030",
      		"config" : "configRepl/localhost:27020"
      	},
      	"hosts" : {
      		"localhost:27030" : "jamesRepl",
      		"localhost:27020" : "config"
      	},
      	"connStrings" : {
      		"configRepl/localhost:27020" : "config",
      		"jamesRepl/localhost:27030" : "jamesRepl"
      	},
      	"ok" : 1,
      	"$clusterTime" : {
      		"clusterTime" : Timestamp(1678302405, 2),
      		"signature" : {
      			"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
      			"keyId" : NumberLong(0)
      		}
      	},
      	"operationTime" : Timestamp(1678302405, 2)
      } 
      switched to db config
      mongos> db.shards.find()
      { "_id" : "jamesRepl", "host" : "jamesRepl/localhost:27030", "state" : 1, "topologyTime" : Timestamp(1678302331, 2) }
      mongos>

            Assignee:
            wenqin.ye@mongodb.com Wenqin Ye
            Reporter:
            james.wahlin@mongodb.com James Wahlin
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: