Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-42658

secondary don't refresh its routing table when transition to primary.

    • Type: Icon: Bug Bug
    • Resolution: Duplicate
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 3.4.22, 3.6.12, 4.0.10
    • Component/s: Sharding
    • Labels:
    • Sharding
    • ALL
    • Hide

      A Sharded Cluster with 2 replset shards (shard A and shard B), 2 mongos (a and b).
      1、in mongos a
      mongos> sh.enableSharding("test")

      { "ok" : 1 }

      mongos> sh.shardCollection("test.table1",{_id:"hashed"})

      { "collectionsharded" : "test.table1", "ok" : 1 }

       
      2、in mongos b
      mongos> db.adminCommand({getShardVersion:"test.table1"})

      { "version" : Timestamp(2, 5), "versionEpoch" : ObjectId("5d4a40b60a833e0eef3082f5"), "ok" : 1 }

       
      3、in mongos a
      sh.shardCollection("test.table2",{_id:"hashed"})
      4、in mongos b
      mongos> db.adminCommand({getShardVersion:"test.table2"})

      { "code" : 118, "ok" : 0, "errmsg" : "Collection test.table2 is not sharded." }

       
      5、stepdown the primary shard of 'test' databse,and check the new priamry shardingState.
       
      pmongo186:PRIMARY> db.adminCommand({shardingState:1})
      {
      "enabled" : true,
      "configServer" : "cfg/xxxxxxxx",
      "shardName" : "pmongo186",
      "clusterId" : ObjectId("5c5166d655a6f24da8dd7418"),
      "versions" :

      { "test.system.indexes" : Timestamp(0, 0), "test.table1" : Timestamp(0, 0), "test.table2" : Timestamp(0, 0), "local.replset.minvalid" : Timestamp(0, 0), "local.replset.election" : Timestamp(0, 0), "local.me" : Timestamp(0, 0), "local.startup_log" : Timestamp(0, 0), "admin.system.version" : Timestamp(0, 0), "local.oplog.rs" : Timestamp(0, 0), "admin.system.roles" : Timestamp(0, 0), "admin.system.users" : Timestamp(0, 0), "local.system.replset" : Timestamp(0, 0), }

      ,
      "ok" : 1
      }
      pmongo186:PRIMARY>
      pmongo186:PRIMARY> db.adminCommand({getShardVersion:"test.table2"})

      { "configServer" : "cfg/xxxxxxxx", "inShardedMode" : false, "mine" : Timestamp(0, 0), "global" : Timestamp(0, 0), "ok" : 1 }

       
      and from now on , all insert of test.table2 with mongos b will go to the primary shard. we flushRouterConfig of mongos b , we can’t get all data just inserted 。
       
      6、 in mongos b
      mongos> db.table2.insert({_id:1})
      WriteResult({ "nInserted" : 1 })
      mongos> db.table2.insert({_id:2})
      WriteResult({ "nInserted" : 1 })
      mongos> db.table2.insert({_id:3})
      WriteResult({ "nInserted" : 1 })
      mongos> db.table2.insert({_id:4})
      WriteResult({ "nInserted" : 1 })
      mongos> db.table2.insert({_id:5})
      WriteResult({ "nInserted" : 1 })
      mongos> db.table2.insert({_id:6})
      WriteResult({ "nInserted" : 1 })
      mongos> db.table2.insert({_id:7})
      WriteResult({ "nInserted" : 1 })
      mongos> db.table2.insert({_id:8})
      WriteResult({ "nInserted" : 1 })
      mongos>
      mongos> db.adminCommand({flushRouterConfig:1})

      { "flushed" : true, "ok" : 1 }

      mongos> db.table2.find()

      { "_id" : 3 } { "_id" : 6 } { "_id" : 8 }

       
      7、in mongos a, we can't get data just insert
      mongos> use test
      switched to db test
      mongos> db.table2.find()

      { "_id" : 3 } { "_id" : 6 } { "_id" : 8 }
      Show
      A Sharded Cluster with 2 replset shards (shard A and shard B), 2 mongos (a and b). 1、in mongos a mongos> sh.enableSharding("test") { "ok" : 1 } mongos> sh.shardCollection("test.table1",{_id:"hashed"}) { "collectionsharded" : "test.table1", "ok" : 1 }   2、in mongos b mongos> db.adminCommand({getShardVersion:"test.table1"}) { "version" : Timestamp(2, 5), "versionEpoch" : ObjectId("5d4a40b60a833e0eef3082f5"), "ok" : 1 }   3、in mongos a sh.shardCollection("test.table2",{_id:"hashed"}) 4、in mongos b mongos> db.adminCommand({getShardVersion:"test.table2"}) { "code" : 118, "ok" : 0, "errmsg" : "Collection test.table2 is not sharded." }   5、 stepdown the primary shard of 'test' databse ,and check the new priamry shardingState.   pmongo186:PRIMARY> db.adminCommand({shardingState:1}) { "enabled" : true, "configServer" : "cfg/xxxxxxxx", "shardName" : "pmongo186", "clusterId" : ObjectId("5c5166d655a6f24da8dd7418"), "versions" : { "test.system.indexes" : Timestamp(0, 0), "test.table1" : Timestamp(0, 0), "test.table2" : Timestamp(0, 0), "local.replset.minvalid" : Timestamp(0, 0), "local.replset.election" : Timestamp(0, 0), "local.me" : Timestamp(0, 0), "local.startup_log" : Timestamp(0, 0), "admin.system.version" : Timestamp(0, 0), "local.oplog.rs" : Timestamp(0, 0), "admin.system.roles" : Timestamp(0, 0), "admin.system.users" : Timestamp(0, 0), "local.system.replset" : Timestamp(0, 0), } , "ok" : 1 } pmongo186:PRIMARY> pmongo186:PRIMARY> db.adminCommand({getShardVersion:"test.table2"}) { "configServer" : "cfg/xxxxxxxx", "inShardedMode" : false, "mine" : Timestamp(0, 0), "global" : Timestamp(0, 0), "ok" : 1 }   and from now on , all insert of test.table2 with mongos b will go to the primary shard. we flushRouterConfig of mongos b , we can’t get all data just inserted 。   6、 in mongos b mongos> db.table2.insert({_id:1}) WriteResult({ "nInserted" : 1 }) mongos> db.table2.insert({_id:2}) WriteResult({ "nInserted" : 1 }) mongos> db.table2.insert({_id:3}) WriteResult({ "nInserted" : 1 }) mongos> db.table2.insert({_id:4}) WriteResult({ "nInserted" : 1 }) mongos> db.table2.insert({_id:5}) WriteResult({ "nInserted" : 1 }) mongos> db.table2.insert({_id:6}) WriteResult({ "nInserted" : 1 }) mongos> db.table2.insert({_id:7}) WriteResult({ "nInserted" : 1 }) mongos> db.table2.insert({_id:8}) WriteResult({ "nInserted" : 1 }) mongos> mongos> db.adminCommand({flushRouterConfig:1}) { "flushed" : true, "ok" : 1 } mongos> db.table2.find() { "_id" : 3 } { "_id" : 6 } { "_id" : 8 }   7、in mongos a , we can't get data just insert mongos> use test switched to db test mongos> db.table2.find() { "_id" : 3 } { "_id" : 6 } { "_id" : 8 }

      Recently, we get a strange phenomenon that all data of a sharding table are in the primary shard. After a period of research, i think i get a bug.
       
      There are 2 key problems here:
      for mongos's CatalogCache,if the databaseInfoEntry already exists, it wouldn’t refresh a new table's metadata. And it will route to primary shard in CatalogCache::getCollectionRoutingInfo .Normally, that won't be a problem because the mongod will check the shardVersion of the opreation. But in the following scenario, there's a problem.
      for mongod, secondary don't refresh its routing table when transition to primary. And all tables are unshard in the new primary's shardingSate. so the operation of the above scenario can be executed, and at last some data may get a wrong shard for the sharded cluster.

            Assignee:
            backlog-server-sharding [DO NOT USE] Backlog - Sharding Team
            Reporter:
            lpc FirstName lipengchong
            Votes:
            1 Vote for this issue
            Watchers:
            10 Start watching this issue

              Created:
              Updated:
              Resolved: