-
Type: Bug
-
Resolution: Fixed
-
Priority: Major - P3
-
Affects Version/s: None
-
Component/s: None
-
Sharding NYC
-
Fully Compatible
-
ALL
-
Sharding NYC 2023-04-03, Sharding NYC 2023-04-17
removeShard on a config shard succeeds despite existence of the config database on that shard. The config database is still accessible after, getShardMap shows the removed shard but it has been removed from the still accessible config.shards collection
Update: The behavior seen here may be correct but is definitely confusing/misleading for users. We should ban use of removeShard for a config shard by users and point them to the transitionToDedicatedConfigServer command which is meant for this purpose
Cluster configuration (note that config db is on the "config" shard)
mongos> sh.status() --- Sharding Status --- sharding version: { "_id" : 1, "clusterId" : ObjectId("6408d34d1b1386f4db260a16") } shards: { "_id" : "config", "host" : "configRepl/localhost:27020", "state" : 1, "topologyTime" : Timestamp(1678299982, 3), "draining" : true } { "_id" : "jamesRepl", "host" : "jamesRepl/localhost:27030", "state" : 1, "topologyTime" : Timestamp(1678299983, 2) } active mongoses: "7.0.0-alpha-538-g7cec1b7" : 1 autosplit: Currently enabled: yes automerge: Currently enabled: yes balancer: Currently enabled: yes Currently running: yes databases: { "_id" : "config", "primary" : "config", "partitioned" : true } config.system.sessions shard key: { "_id" : 1 } unique: false balancing: true chunks: config 696 jamesRepl 328 too many chunks to print, use verbose if you want to force print
movePrimary of the config database to jamesRepl is disallowed
mongos> db.adminCommand({movePrimary: "config", to: "jamesRepl"}) { "ok" : 0, "errmsg" : "Can't move primary for config database", "code" : 72, "codeName" : "InvalidOptions", "$clusterTime" : { "clusterTime" : Timestamp(1678300947, 29), "signature" : { "hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="), "keyId" : NumberLong(0) } }, "operationTime" : Timestamp(1678300947, 29) }
Despite movePrimary of the config db being disallowed, running removeShard on "config" appears to succeed
mongos> db.adminCommand({removeShard : "config"}) { "msg" : "draining started successfully", "state" : "started", "shard" : "config", "note" : "you need to drop or movePrimary these databases", "dbsToMove" : [ ], "ok" : 1, "$clusterTime" : { "clusterTime" : Timestamp(1678302328, 3), "signature" : { "hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="), "keyId" : NumberLong(0) } }, "operationTime" : Timestamp(1678302328, 3) } mongos> db.adminCommand({removeShard : "config"}) { "msg" : "removeshard completed successfully", "state" : "completed", "shard" : "config", "ok" : 1, "$clusterTime" : { "clusterTime" : Timestamp(1678302331, 5), "signature" : { "hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="), "keyId" : NumberLong(0) } }, "operationTime" : Timestamp(1678302331, 5) }mongos> db.adminCommand({removeShard : "config"}) { "ok" : 0, "errmsg" : "Shard config does not exist", "code" : 70, "codeName" : "ShardNotFound", "$clusterTime" : { "clusterTime" : Timestamp(1678302405, 2), "signature" : { "hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="), "keyId" : NumberLong(0) } }, "operationTime" : Timestamp(1678302405, 2) }
Post removal leaves mongos in an inconsistent state with getShardMap showing the shard as existing, but the entry for that shard has been removed from the config.shards collection (which is still accessible despite the shard with the config database having been removed)
mongos> db.adminCommand({getShardMap: 1}) { "map" : { "jamesRepl" : "jamesRepl/localhost:27030", "config" : "configRepl/localhost:27020" }, "hosts" : { "localhost:27030" : "jamesRepl", "localhost:27020" : "config" }, "connStrings" : { "configRepl/localhost:27020" : "config", "jamesRepl/localhost:27030" : "jamesRepl" }, "ok" : 1, "$clusterTime" : { "clusterTime" : Timestamp(1678302405, 2), "signature" : { "hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="), "keyId" : NumberLong(0) } }, "operationTime" : Timestamp(1678302405, 2) } switched to db config mongos> db.shards.find() { "_id" : "jamesRepl", "host" : "jamesRepl/localhost:27030", "state" : 1, "topologyTime" : Timestamp(1678302331, 2) } mongos>
- is related to
-
SERVER-74738 Can't run transitionToDedicatedConfigServer with auth on, but can run removeShard
- Closed