Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-42658

secondary don't refresh its routing table when transition to primary.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major - P3
    • Resolution: Duplicate
    • 3.4.22, 3.6.12, 4.0.10
    • None
    • Sharding
    • Sharding
    • ALL
    • Hide

      A Sharded Cluster with 2 replset shards (shard A and shard B), 2 mongos (a and b).
      1、in mongos a
      mongos> sh.enableSharding("test")

      { "ok" : 1 }

      mongos> sh.shardCollection("test.table1",{_id:"hashed"})

      { "collectionsharded" : "test.table1", "ok" : 1 }

       
      2、in mongos b
      mongos> db.adminCommand({getShardVersion:"test.table1"})

      { "version" : Timestamp(2, 5), "versionEpoch" : ObjectId("5d4a40b60a833e0eef3082f5"), "ok" : 1 }

       
      3、in mongos a
      sh.shardCollection("test.table2",{_id:"hashed"})
      4、in mongos b
      mongos> db.adminCommand({getShardVersion:"test.table2"})

      { "code" : 118, "ok" : 0, "errmsg" : "Collection test.table2 is not sharded." }

       
      5、stepdown the primary shard of 'test' databse,and check the new priamry shardingState.
       
      pmongo186:PRIMARY> db.adminCommand({shardingState:1})
      {
      "enabled" : true,
      "configServer" : "cfg/xxxxxxxx",
      "shardName" : "pmongo186",
      "clusterId" : ObjectId("5c5166d655a6f24da8dd7418"),
      "versions" :

      { "test.system.indexes" : Timestamp(0, 0), "test.table1" : Timestamp(0, 0), "test.table2" : Timestamp(0, 0), "local.replset.minvalid" : Timestamp(0, 0), "local.replset.election" : Timestamp(0, 0), "local.me" : Timestamp(0, 0), "local.startup_log" : Timestamp(0, 0), "admin.system.version" : Timestamp(0, 0), "local.oplog.rs" : Timestamp(0, 0), "admin.system.roles" : Timestamp(0, 0), "admin.system.users" : Timestamp(0, 0), "local.system.replset" : Timestamp(0, 0), }

      ,
      "ok" : 1
      }
      pmongo186:PRIMARY>
      pmongo186:PRIMARY> db.adminCommand({getShardVersion:"test.table2"})

      { "configServer" : "cfg/xxxxxxxx", "inShardedMode" : false, "mine" : Timestamp(0, 0), "global" : Timestamp(0, 0), "ok" : 1 }

       
      and from now on , all insert of test.table2 with mongos b will go to the primary shard. we flushRouterConfig of mongos b , we can’t get all data just inserted 。
       
      6、 in mongos b
      mongos> db.table2.insert({_id:1})
      WriteResult({ "nInserted" : 1 })
      mongos> db.table2.insert({_id:2})
      WriteResult({ "nInserted" : 1 })
      mongos> db.table2.insert({_id:3})
      WriteResult({ "nInserted" : 1 })
      mongos> db.table2.insert({_id:4})
      WriteResult({ "nInserted" : 1 })
      mongos> db.table2.insert({_id:5})
      WriteResult({ "nInserted" : 1 })
      mongos> db.table2.insert({_id:6})
      WriteResult({ "nInserted" : 1 })
      mongos> db.table2.insert({_id:7})
      WriteResult({ "nInserted" : 1 })
      mongos> db.table2.insert({_id:8})
      WriteResult({ "nInserted" : 1 })
      mongos>
      mongos> db.adminCommand({flushRouterConfig:1})

      { "flushed" : true, "ok" : 1 }

      mongos> db.table2.find()

      { "_id" : 3 } { "_id" : 6 } { "_id" : 8 }

       
      7、in mongos a, we can't get data just insert
      mongos> use test
      switched to db test
      mongos> db.table2.find()

      { "_id" : 3 } { "_id" : 6 } { "_id" : 8 }
      Show
      A Sharded Cluster with 2 replset shards (shard A and shard B), 2 mongos (a and b). 1、in mongos a mongos> sh.enableSharding("test") { "ok" : 1 } mongos> sh.shardCollection("test.table1",{_id:"hashed"}) { "collectionsharded" : "test.table1", "ok" : 1 }   2、in mongos b mongos> db.adminCommand({getShardVersion:"test.table1"}) { "version" : Timestamp(2, 5), "versionEpoch" : ObjectId("5d4a40b60a833e0eef3082f5"), "ok" : 1 }   3、in mongos a sh.shardCollection("test.table2",{_id:"hashed"}) 4、in mongos b mongos> db.adminCommand({getShardVersion:"test.table2"}) { "code" : 118, "ok" : 0, "errmsg" : "Collection test.table2 is not sharded." }   5、 stepdown the primary shard of 'test' databse ,and check the new priamry shardingState.   pmongo186:PRIMARY> db.adminCommand({shardingState:1}) { "enabled" : true, "configServer" : "cfg/xxxxxxxx", "shardName" : "pmongo186", "clusterId" : ObjectId("5c5166d655a6f24da8dd7418"), "versions" : { "test.system.indexes" : Timestamp(0, 0), "test.table1" : Timestamp(0, 0), "test.table2" : Timestamp(0, 0), "local.replset.minvalid" : Timestamp(0, 0), "local.replset.election" : Timestamp(0, 0), "local.me" : Timestamp(0, 0), "local.startup_log" : Timestamp(0, 0), "admin.system.version" : Timestamp(0, 0), "local.oplog.rs" : Timestamp(0, 0), "admin.system.roles" : Timestamp(0, 0), "admin.system.users" : Timestamp(0, 0), "local.system.replset" : Timestamp(0, 0), } , "ok" : 1 } pmongo186:PRIMARY> pmongo186:PRIMARY> db.adminCommand({getShardVersion:"test.table2"}) { "configServer" : "cfg/xxxxxxxx", "inShardedMode" : false, "mine" : Timestamp(0, 0), "global" : Timestamp(0, 0), "ok" : 1 }   and from now on , all insert of test.table2 with mongos b will go to the primary shard. we flushRouterConfig of mongos b , we can’t get all data just inserted 。   6、 in mongos b mongos> db.table2.insert({_id:1}) WriteResult({ "nInserted" : 1 }) mongos> db.table2.insert({_id:2}) WriteResult({ "nInserted" : 1 }) mongos> db.table2.insert({_id:3}) WriteResult({ "nInserted" : 1 }) mongos> db.table2.insert({_id:4}) WriteResult({ "nInserted" : 1 }) mongos> db.table2.insert({_id:5}) WriteResult({ "nInserted" : 1 }) mongos> db.table2.insert({_id:6}) WriteResult({ "nInserted" : 1 }) mongos> db.table2.insert({_id:7}) WriteResult({ "nInserted" : 1 }) mongos> db.table2.insert({_id:8}) WriteResult({ "nInserted" : 1 }) mongos> mongos> db.adminCommand({flushRouterConfig:1}) { "flushed" : true, "ok" : 1 } mongos> db.table2.find() { "_id" : 3 } { "_id" : 6 } { "_id" : 8 }   7、in mongos a , we can't get data just insert mongos> use test switched to db test mongos> db.table2.find() { "_id" : 3 } { "_id" : 6 } { "_id" : 8 }

    Description

      Recently, we get a strange phenomenon that all data of a sharding table are in the primary shard. After a period of research, i think i get a bug.
       
      There are 2 key problems here:
      for mongos's CatalogCache,if the databaseInfoEntry already exists, it wouldn’t refresh a new table's metadata. And it will route to primary shard in CatalogCache::getCollectionRoutingInfo .Normally, that won't be a problem because the mongod will check the shardVersion of the opreation. But in the following scenario, there's a problem.
      for mongod, secondary don't refresh its routing table when transition to primary. And all tables are unshard in the new primary's shardingSate. so the operation of the above scenario can be executed, and at last some data may get a wrong shard for the sharded cluster.

      Attachments

        Issue Links

          Activity

            People

              backlog-server-sharding Backlog - Sharding Team
              lpc lipengchong
              Votes:
              1 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: