-
Type:
Improvement
-
Resolution: Done
-
Priority:
Major - P3
-
None
-
Affects Version/s: None
-
Component/s: Sharding
-
Sharding
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Original description
The movePrimary command is not refreshing existing mongos metadata caches resulting in inconsistent query runs.
Steps to Reproduce
- Deployed 2 shard cluster with each shard running a replica set of 3 nodes. I also started 2 mongos.
- Created a database mydb on mongos1 and this is what sh.status returned:
--- Sharding Status --- sharding version: { "_id": 1, "minCompatibleVersion": 5, "currentVersion": 6, "clusterId": ObjectId("56a7671754ad7a545f803203") } shards: { "_id": "shard01", "host": "shard01/ankit:27018,ankit:27019,ankit:27020" } { "_id": "shard02", "host": "shard02/ankit:27021,ankit:27022,ankit:27023" } balancer: Currently enabled: yes Currently running: no Failed balancer rounds in last 5 attempts: 0 Migration Results for the last 24 hours: No recent migrations databases: { "_id": "admin", "partitioned": false, "primary": "config" } { "_id": "test", "partitioned": false, "primary": "shard01" } { "_id": "mydb", "partitioned": false, "primary": "shard01" } - Inserted a sample document:
ankit(mongos-3.0.8)[mongos] mydb> db.mycoll.find() { "_id": ObjectId("56a767ef88eb8c1294029482") } - Explain output from mongos1 (Notice the request going to shard1)
ankit(mongos-3.0.8)[mongos] mydb1> db.mycoll1.find().explain() { "queryPlanner": { "mongosPlannerVersion": 1, "winningPlan": { "stage": "SINGLE_SHARD", "shards": [ { "shardName": "shard01", "connectionString": "shard01/ankit:27018,ankit:27019,ankit:27020", "serverInfo": { "host": "ankit", "port": 27018, "version": "3.0.8", "gitVersion": "83d8cc25e00e42856924d84e220fbe4a839e605d" }, "plannerVersion": 1, "namespace": "mydb.mycoll", - Explain output from mongos2 (The request correctly going to shard1)
ankit:27025(mongos-3.0.8)[mongos] mydb> db.mycoll.find().explain() { "queryPlanner": { "mongosPlannerVersion": 1, "winningPlan": { "stage": "SINGLE_SHARD", "shards": [ { "shardName": "shard01", "connectionString": "shard01/ankit:27018,ankit:27019,ankit:27020", "serverInfo": { "host": "ankit", "port": 27018, "version": "3.0.8", "gitVersion": "83d8cc25e00e42856924d84e220fbe4a839e605d" }, "plannerVersion": 1, "namespace": "mydb.mycoll", - Used the movePrimary command to move mydb to shard2.
- sh.status output from mongos1
databases: { "_id": "admin", "partitioned": false, "primary": "config" } { "_id": "test", "partitioned": false, "primary": "shard01" } { "_id": "mydb", "partitioned": false, "primary": "shard02" } - sh.status output from mongos2
databases: { "_id": "admin", "partitioned": false, "primary": "config" } { "_id": "test", "partitioned": false, "primary": "shard01" } { "_id": "mydb", "partitioned": false, "primary": "shard02" } - Explain output from mongos1 (Request correctly going to shard2 now)
nkit(mongos-3.0.8)[mongos] mydb> db.mycoll.find().explain() { "queryPlanner": { "mongosPlannerVersion": 1, "winningPlan": { "stage": "SINGLE_SHARD", "shards": [ { "shardName": "shard02", "connectionString": "shard02/ankit:27021,ankit:27022,ankit:27023", "serverInfo": { "host": "ankit", "port": 27021, "version": "3.0.8", "gitVersion": "83d8cc25e00e42856924d84e220fbe4a839e605d" }, "plannerVersion": 1, "namespace": "mydb.mycoll", - Explain output from mongos2 (Request STILL going to shard1)
ankit:27025(mongos-3.0.8)[mongos] mydb> db.mycoll.find().explain() { "queryPlanner": { "mongosPlannerVersion": 1, "winningPlan": { "stage": "SINGLE_SHARD", "shards": [ { "shardName": "shard01", "connectionString": "shard01/ankit:27018,ankit:27019,ankit:27020", "serverInfo": { "host": "ankit", "port": 27018, "version": "3.0.8", "gitVersion": "83d8cc25e00e42856924d84e220fbe4a839e605d" }, "plannerVersion": 1, "namespace": "mydb.mycoll",Also, note that no results were returned from mongos2 find()
- Further, I manually created mydb database (and inserted a sample document) directly on mongod of shard1. This emulated the case where 2 databases with same name were sitting on different shards i.e. show dbs returning mydb on both the shards.
ankit:27018(mongod-3.0.8)[PRIMARY:shard01] mydb1> use mydb switched to db mydb ankit:27018(mongod-3.0.8)[PRIMARY:shard01] mydb> db.mycoll.insert({}) Inserted 1 record(s) in 139ms - Running find query on mongos2 again went to shard1 but brought the document that was created in above step.
Further, I was able to fix this by clearing the metadata cache on mongos2. For this purpose, I used flushRouterConfig command:
db.adminCommand("flushRouterConfig")
Or, restarting mongos also fixes this issue.