Original description
The movePrimary command is not refreshing existing mongos metadata caches resulting in inconsistent query runs.
Steps to Reproduce
- Deployed 2 shard cluster with each shard running a replica set of 3 nodes. I also started 2 mongos.
- Created a database mydb on mongos1 and this is what sh.status returned:
--- Sharding Status ---
|
sharding version: {
|
"_id": 1,
|
"minCompatibleVersion": 5,
|
"currentVersion": 6,
|
"clusterId": ObjectId("56a7671754ad7a545f803203")
|
}
|
shards:
|
{ "_id": "shard01", "host": "shard01/ankit:27018,ankit:27019,ankit:27020" }
|
{ "_id": "shard02", "host": "shard02/ankit:27021,ankit:27022,ankit:27023" }
|
balancer:
|
Currently enabled: yes
|
Currently running: no
|
Failed balancer rounds in last 5 attempts: 0
|
Migration Results for the last 24 hours:
|
No recent migrations
|
databases:
|
{ "_id": "admin", "partitioned": false, "primary": "config" }
|
{ "_id": "test", "partitioned": false, "primary": "shard01" }
|
{ "_id": "mydb", "partitioned": false, "primary": "shard01" }
|
- Inserted a sample document:
ankit(mongos-3.0.8)[mongos] mydb> db.mycoll.find()
|
{
|
"_id": ObjectId("56a767ef88eb8c1294029482")
|
}
|
- Explain output from mongos1 (Notice the request going to shard1)
ankit(mongos-3.0.8)[mongos] mydb1> db.mycoll1.find().explain()
|
{
|
"queryPlanner": {
|
"mongosPlannerVersion": 1,
|
"winningPlan": {
|
"stage": "SINGLE_SHARD",
|
"shards": [
|
{
|
"shardName": "shard01",
|
"connectionString": "shard01/ankit:27018,ankit:27019,ankit:27020",
|
"serverInfo": {
|
"host": "ankit",
|
"port": 27018,
|
"version": "3.0.8",
|
"gitVersion": "83d8cc25e00e42856924d84e220fbe4a839e605d"
|
},
|
"plannerVersion": 1,
|
"namespace": "mydb.mycoll",
|
- Explain output from mongos2 (The request correctly going to shard1)
ankit:27025(mongos-3.0.8)[mongos] mydb> db.mycoll.find().explain()
|
{
|
"queryPlanner": {
|
"mongosPlannerVersion": 1,
|
"winningPlan": {
|
"stage": "SINGLE_SHARD",
|
"shards": [
|
{
|
"shardName": "shard01",
|
"connectionString": "shard01/ankit:27018,ankit:27019,ankit:27020",
|
"serverInfo": {
|
"host": "ankit",
|
"port": 27018,
|
"version": "3.0.8",
|
"gitVersion": "83d8cc25e00e42856924d84e220fbe4a839e605d"
|
},
|
"plannerVersion": 1,
|
"namespace": "mydb.mycoll",
|
- Used the movePrimary command to move mydb to shard2.
- sh.status output from mongos1
databases:
|
{ "_id": "admin", "partitioned": false, "primary": "config" }
|
{ "_id": "test", "partitioned": false, "primary": "shard01" }
|
{ "_id": "mydb", "partitioned": false, "primary": "shard02" }
|
- sh.status output from mongos2
databases:
|
{ "_id": "admin", "partitioned": false, "primary": "config" }
|
{ "_id": "test", "partitioned": false, "primary": "shard01" }
|
{ "_id": "mydb", "partitioned": false, "primary": "shard02" }
|
- Explain output from mongos1 (Request correctly going to shard2 now)
nkit(mongos-3.0.8)[mongos] mydb> db.mycoll.find().explain()
|
{
|
"queryPlanner": {
|
"mongosPlannerVersion": 1,
|
"winningPlan": {
|
"stage": "SINGLE_SHARD",
|
"shards": [
|
{
|
"shardName": "shard02",
|
"connectionString": "shard02/ankit:27021,ankit:27022,ankit:27023",
|
"serverInfo": {
|
"host": "ankit",
|
"port": 27021,
|
"version": "3.0.8",
|
"gitVersion": "83d8cc25e00e42856924d84e220fbe4a839e605d"
|
},
|
"plannerVersion": 1,
|
"namespace": "mydb.mycoll",
|
- Explain output from mongos2 (Request STILL going to shard1)
ankit:27025(mongos-3.0.8)[mongos] mydb> db.mycoll.find().explain()
|
{
|
"queryPlanner": {
|
"mongosPlannerVersion": 1,
|
"winningPlan": {
|
"stage": "SINGLE_SHARD",
|
"shards": [
|
{
|
"shardName": "shard01",
|
"connectionString": "shard01/ankit:27018,ankit:27019,ankit:27020",
|
"serverInfo": {
|
"host": "ankit",
|
"port": 27018,
|
"version": "3.0.8",
|
"gitVersion": "83d8cc25e00e42856924d84e220fbe4a839e605d"
|
},
|
"plannerVersion": 1,
|
"namespace": "mydb.mycoll",
|
Also, note that no results were returned from mongos2 find()
- Further, I manually created mydb database (and inserted a sample document) directly on mongod of shard1. This emulated the case where 2 databases with same name were sitting on different shards i.e. show dbs returning mydb on both the shards.
ankit:27018(mongod-3.0.8)[PRIMARY:shard01] mydb1> use mydb
|
switched to db mydb
|
ankit:27018(mongod-3.0.8)[PRIMARY:shard01] mydb> db.mycoll.insert({})
|
Inserted 1 record(s) in 139ms
|
- Running find query on mongos2 again went to shard1 but brought the document that was created in above step.
Further, I was able to fix this by clearing the metadata cache on mongos2. For this purpose, I used flushRouterConfig command:
db.adminCommand("flushRouterConfig")
|
Or, restarting mongos also fixes this issue.
|