Details
-
Improvement
-
Resolution: Done
-
Major - P3
-
None
-
None
-
Sharding
Description
Original description
The movePrimary command is not refreshing existing mongos metadata caches resulting in inconsistent query runs.
Steps to Reproduce
- Deployed 2 shard cluster with each shard running a replica set of 3 nodes. I also started 2 mongos.
- Created a database mydb on mongos1 and this is what sh.status returned:
--- Sharding Status ---sharding version: {"_id": 1,"minCompatibleVersion": 5,"currentVersion": 6,"clusterId": ObjectId("56a7671754ad7a545f803203")}shards:{ "_id": "shard01", "host": "shard01/ankit:27018,ankit:27019,ankit:27020" }{ "_id": "shard02", "host": "shard02/ankit:27021,ankit:27022,ankit:27023" }balancer:Currently enabled: yesCurrently running: noFailed balancer rounds in last 5 attempts: 0Migration Results for the last 24 hours:No recent migrationsdatabases:{ "_id": "admin", "partitioned": false, "primary": "config" }{ "_id": "test", "partitioned": false, "primary": "shard01" }{ "_id": "mydb", "partitioned": false, "primary": "shard01" } - Inserted a sample document:
ankit(mongos-3.0.8)[mongos] mydb> db.mycoll.find(){"_id": ObjectId("56a767ef88eb8c1294029482")} - Explain output from mongos1 (Notice the request going to shard1)
ankit(mongos-3.0.8)[mongos] mydb1> db.mycoll1.find().explain(){"queryPlanner": {"mongosPlannerVersion": 1,"winningPlan": {"stage": "SINGLE_SHARD","shards": [{"shardName": "shard01","connectionString": "shard01/ankit:27018,ankit:27019,ankit:27020","serverInfo": {"host": "ankit","port": 27018,"version": "3.0.8","gitVersion": "83d8cc25e00e42856924d84e220fbe4a839e605d"},"plannerVersion": 1,"namespace": "mydb.mycoll", - Explain output from mongos2 (The request correctly going to shard1)
ankit:27025(mongos-3.0.8)[mongos] mydb> db.mycoll.find().explain(){"queryPlanner": {"mongosPlannerVersion": 1,"winningPlan": {"stage": "SINGLE_SHARD","shards": [{"shardName": "shard01","connectionString": "shard01/ankit:27018,ankit:27019,ankit:27020","serverInfo": {"host": "ankit","port": 27018,"version": "3.0.8","gitVersion": "83d8cc25e00e42856924d84e220fbe4a839e605d"},"plannerVersion": 1,"namespace": "mydb.mycoll", - Used the movePrimary command to move mydb to shard2.
- sh.status output from mongos1
databases:{ "_id": "admin", "partitioned": false, "primary": "config" }{ "_id": "test", "partitioned": false, "primary": "shard01" }{ "_id": "mydb", "partitioned": false, "primary": "shard02" } - sh.status output from mongos2
databases:{ "_id": "admin", "partitioned": false, "primary": "config" }{ "_id": "test", "partitioned": false, "primary": "shard01" }{ "_id": "mydb", "partitioned": false, "primary": "shard02" } - Explain output from mongos1 (Request correctly going to shard2 now)
nkit(mongos-3.0.8)[mongos] mydb> db.mycoll.find().explain(){"queryPlanner": {"mongosPlannerVersion": 1,"winningPlan": {"stage": "SINGLE_SHARD","shards": [{"shardName": "shard02","connectionString": "shard02/ankit:27021,ankit:27022,ankit:27023","serverInfo": {"host": "ankit","port": 27021,"version": "3.0.8","gitVersion": "83d8cc25e00e42856924d84e220fbe4a839e605d"},"plannerVersion": 1,"namespace": "mydb.mycoll", - Explain output from mongos2 (Request STILL going to shard1)
ankit:27025(mongos-3.0.8)[mongos] mydb> db.mycoll.find().explain(){"queryPlanner": {"mongosPlannerVersion": 1,"winningPlan": {"stage": "SINGLE_SHARD","shards": [{"shardName": "shard01","connectionString": "shard01/ankit:27018,ankit:27019,ankit:27020","serverInfo": {"host": "ankit","port": 27018,"version": "3.0.8","gitVersion": "83d8cc25e00e42856924d84e220fbe4a839e605d"},"plannerVersion": 1,"namespace": "mydb.mycoll",Also, note that no results were returned from mongos2 find()
- Further, I manually created mydb database (and inserted a sample document) directly on mongod of shard1. This emulated the case where 2 databases with same name were sitting on different shards i.e. show dbs returning mydb on both the shards.
ankit:27018(mongod-3.0.8)[PRIMARY:shard01] mydb1> use mydbswitched to db mydbankit:27018(mongod-3.0.8)[PRIMARY:shard01] mydb> db.mycoll.insert({})Inserted 1 record(s) in 139ms - Running find query on mongos2 again went to shard1 but brought the document that was created in above step.
Further, I was able to fix this by clearing the metadata cache on mongos2. For this purpose, I used flushRouterConfig command:
db.adminCommand("flushRouterConfig")
|
Or, restarting mongos also fixes this issue.