Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Duplicate
Priority: Major - P3
Fix Version/s: None
Affects Version/s: 3.4.22, 3.6.12, 4.0.10
Component/s: Sharding
Labels:
- pull-request

Assigned Teams:

Sharding
Operating System:
ALL
Steps To Reproduce:

Hide

A Sharded Cluster with 2 replset shards (shard A and shard B), 2 mongos (a and b).
1、in mongos a
mongos> sh.enableSharding("test")
{ "ok" : 1 }
mongos> sh.shardCollection("test.table1",{_id:"hashed"})
{ "collectionsharded" : "test.table1", "ok" : 1 }

2、in mongos b
mongos> db.adminCommand({getShardVersion:"test.table1"})
{ "version" : Timestamp(2, 5), "versionEpoch" : ObjectId("5d4a40b60a833e0eef3082f5"), "ok" : 1 }

3、in mongos a
sh.shardCollection("test.table2",{_id:"hashed"})
4、in mongos b
mongos> db.adminCommand({getShardVersion:"test.table2"})
{ "code" : 118, "ok" : 0, "errmsg" : "Collection test.table2 is not sharded." }

5、stepdown the primary shard of 'test' databse，and check the new priamry shardingState.

pmongo186:PRIMARY> db.adminCommand({shardingState:1})
{
"enabled" : true,
"configServer" : "cfg/xxxxxxxx",
"shardName" : "pmongo186",
"clusterId" : ObjectId("5c5166d655a6f24da8dd7418"),
"versions" :
{ "test.system.indexes" : Timestamp(0, 0), "test.table1" : Timestamp(0, 0), "test.table2" : Timestamp(0, 0), "local.replset.minvalid" : Timestamp(0, 0), "local.replset.election" : Timestamp(0, 0), "local.me" : Timestamp(0, 0), "local.startup_log" : Timestamp(0, 0), "admin.system.version" : Timestamp(0, 0), "local.oplog.rs" : Timestamp(0, 0), "admin.system.roles" : Timestamp(0, 0), "admin.system.users" : Timestamp(0, 0), "local.system.replset" : Timestamp(0, 0), }
,
"ok" : 1
}
pmongo186:PRIMARY>
pmongo186:PRIMARY> db.adminCommand({getShardVersion:"test.table2"})
{ "configServer" : "cfg/xxxxxxxx", "inShardedMode" : false, "mine" : Timestamp(0, 0), "global" : Timestamp(0, 0), "ok" : 1 }

and from now on , all insert of test.table2 with mongos b will go to the primary shard. we flushRouterConfig of mongos b , we can’t get all data just inserted 。

6、 in mongos b
mongos> db.table2.insert({_id:1})
WriteResult({ "nInserted" : 1 })
mongos> db.table2.insert({_id:2})
WriteResult({ "nInserted" : 1 })
mongos> db.table2.insert({_id:3})
WriteResult({ "nInserted" : 1 })
mongos> db.table2.insert({_id:4})
WriteResult({ "nInserted" : 1 })
mongos> db.table2.insert({_id:5})
WriteResult({ "nInserted" : 1 })
mongos> db.table2.insert({_id:6})
WriteResult({ "nInserted" : 1 })
mongos> db.table2.insert({_id:7})
WriteResult({ "nInserted" : 1 })
mongos> db.table2.insert({_id:8})
WriteResult({ "nInserted" : 1 })
mongos>
mongos> db.adminCommand({flushRouterConfig:1})
{ "flushed" : true, "ok" : 1 }
mongos> db.table2.find()
{ "_id" : 3 } { "_id" : 6 } { "_id" : 8 }

7、in mongos a, we can't get data just insert
mongos> use test
switched to db test
mongos> db.table2.find()
{ "_id" : 3 } { "_id" : 6 } { "_id" : 8 }

Show
A Sharded Cluster with 2 replset shards (shard A and shard B), 2 mongos (a and b). 1、in mongos a mongos> sh.enableSharding("test") { "ok" : 1 } mongos> sh.shardCollection("test.table1",{_id:"hashed"}) { "collectionsharded" : "test.table1", "ok" : 1 } 2、in mongos b mongos> db.adminCommand({getShardVersion:"test.table1"}) { "version" : Timestamp(2, 5), "versionEpoch" : ObjectId("5d4a40b60a833e0eef3082f5"), "ok" : 1 } 3、in mongos a sh.shardCollection("test.table2",{_id:"hashed"}) 4、in mongos b mongos> db.adminCommand({getShardVersion:"test.table2"}) { "code" : 118, "ok" : 0, "errmsg" : "Collection test.table2 is not sharded." } 5、 stepdown the primary shard of 'test' databse ，and check the new priamry shardingState. pmongo186:PRIMARY> db.adminCommand({shardingState:1}) { "enabled" : true, "configServer" : "cfg/xxxxxxxx", "shardName" : "pmongo186", "clusterId" : ObjectId("5c5166d655a6f24da8dd7418"), "versions" : { "test.system.indexes" : Timestamp(0, 0), "test.table1" : Timestamp(0, 0), "test.table2" : Timestamp(0, 0), "local.replset.minvalid" : Timestamp(0, 0), "local.replset.election" : Timestamp(0, 0), "local.me" : Timestamp(0, 0), "local.startup_log" : Timestamp(0, 0), "admin.system.version" : Timestamp(0, 0), "local.oplog.rs" : Timestamp(0, 0), "admin.system.roles" : Timestamp(0, 0), "admin.system.users" : Timestamp(0, 0), "local.system.replset" : Timestamp(0, 0), } , "ok" : 1 } pmongo186:PRIMARY> pmongo186:PRIMARY> db.adminCommand({getShardVersion:"test.table2"}) { "configServer" : "cfg/xxxxxxxx", "inShardedMode" : false, "mine" : Timestamp(0, 0), "global" : Timestamp(0, 0), "ok" : 1 } and from now on , all insert of test.table2 with mongos b will go to the primary shard. we flushRouterConfig of mongos b , we can’t get all data just inserted 。 6、 in mongos b mongos> db.table2.insert({_id:1}) WriteResult({ "nInserted" : 1 }) mongos> db.table2.insert({_id:2}) WriteResult({ "nInserted" : 1 }) mongos> db.table2.insert({_id:3}) WriteResult({ "nInserted" : 1 }) mongos> db.table2.insert({_id:4}) WriteResult({ "nInserted" : 1 }) mongos> db.table2.insert({_id:5}) WriteResult({ "nInserted" : 1 }) mongos> db.table2.insert({_id:6}) WriteResult({ "nInserted" : 1 }) mongos> db.table2.insert({_id:7}) WriteResult({ "nInserted" : 1 }) mongos> db.table2.insert({_id:8}) WriteResult({ "nInserted" : 1 }) mongos> mongos> db.adminCommand({flushRouterConfig:1}) { "flushed" : true, "ok" : 1 } mongos> db.table2.find() { "_id" : 3 } { "_id" : 6 } { "_id" : 8 } 7、in mongos a , we can't get data just insert mongos> use test switched to db test mongos> db.table2.find() { "_id" : 3 } { "_id" : 6 } { "_id" : 8 }
Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

Recently, we get a strange phenomenon that all data of a sharding table are in the primary shard. After a period of research, i think i get a bug.

There are 2 key problems here:
for mongos's CatalogCache，if the databaseInfoEntry already exists, it wouldn’t refresh a new table's metadata. And it will route to primary shard in CatalogCache::getCollectionRoutingInfo .Normally, that won't be a problem because the mongod will check the shardVersion of the opreation. But in the following scenario, there's a problem.
for mongod, secondary don't refresh its routing table when transition to primary. And all tables are unshard in the new primary's shardingSate. so the operation of the above scenario can be executed, and at last some data may get a wrong shard for the sharded cluster.

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

patch_3_4.diff
2 kB
Aug 09 2019 07:12:24 AM UTC

duplicates

SERVER-32198 Missing collection metadata on the shard implies both UNSHARDED and "metadata not loaded yet"

Closed

Assignee:: [DO NOT USE] Backlog - Sharding Team
Reporter:: FirstName lipengchong
Participants:: [DO NOT USE] Backlog - Sharding Team, Danny Hatcher, FirstName lipengchong, Kaloian Manassiev
Votes:: 1 Vote for this issue
Watchers:: 10 Start watching this issue

Created:: Aug 07 2019 03:20:36 AM UTC
Updated:: Dec 06 2022 02:51:26 AM UTC
Resolved:: Aug 13 2019 11:21:19 AM UTC

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates