[SERVER-54625] Maybe the root case for reading from secondary return empty even readConcern=local/majority Created: 19/Feb/21  Updated: 26/Apr/23  Resolved: 01/Mar/21

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 4.2.1
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Xin Wang Assignee: Edwin Zhou
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File image-2021-02-07-23-46-36-583.png    
Issue Links:
Duplicate
duplicates SERVER-53474 Cannot read from another mongos with ... Closed
Related
is related to SERVER-76541 documents can't be find with secondar... Closed
Operating System: ALL
Participants:
Case:

 Description   

As I see there are some issues reported that reading from

secondary return empty even readConcern=local/majority, I think the root cause may be that the secondary does not load routeTable to their CatalogCache.

 

I comment same thing under https://jira.mongodb.org/browse/SERVER-54373 , but there is no reply for me. I want to know if I'm wrong?

 

I supply the reproduce step with js code here:

var st = new ShardingTest({
 shards: 2, mongos: 2, other: {
 rs: true,
 }
});
 
var adminDB = st.s0.getDB('admin');
assert.commandWorked(adminDB.runCommand({enableSharding: "testdb", primaryShard: st.rs0.name}));
jsTest.log("=============== first round");
runTest("testcoll");
jsTest.log("=============== second round");
runTest("testcoll1");
 
function runTest(testCollection){
 assert.commandWorked(adminDB.runCommand({shardCollection: "testdb."+testCollection, key: {_id: "hashed"}}));
 var testDB = st.s0.getDB('testdb');
 for (var i = 0 ; i < 10 ; i ++ ){
 testDB.getCollection(testCollection).insert({name:i});
 }
 printjson(st.rs0.getPrimary().getDB('testdb').getCollection(testCollection).find().toArray());
 jsTest.log("Above are the documents in " + st.rs0.name + " (primaryShard)");
 printjson(st.rs1.getPrimary().getDB('testdb').getCollection(testCollection).find().toArray());
 jsTest.log("Above are the documents in " + st.rs1.name);
 
 jsTest.log("=============== database info");
 var dbDoc = st.config.databases.findOne({_id: "testdb"});
 printjson(dbDoc);
 
 jsTest.log("=============== choose one document not in primary shard");
 var one_doc = st.rs1.getPrimary().getDB('testdb').getCollection(testCollection).findOne();
 var one_doc_id = one_doc["_id"];
 printjson(one_doc);
 jsTest.log(one_doc_id);
 
 jsTest.log("=============== find doc with secondary pref");
 jsTest.log("result: " + st.s1.getDB("testdb").getCollection(testCollection).find({_id: one_doc_id}).readPref("secondary").toArray().length);
 jsTest.log("=============== find doc with secondary pref done");
 
 jsTest.log("=============== find doc with secondary pref and local concern");
 jsTest.log("result: " + st.s1.getDB("testdb").getCollection(testCollection).find({_id: one_doc_id}).readPref("secondary").readConcern("local").toArray().length);
 jsTest.log("=============== find doc with secondary pref and local concern done");
 
}
 
st.stop();

 

With the first round done , everything runs ok. And when we run `.find({_id: one_doc_id}).readPref("secondary")` through st.s1, we see the refresh log:

s21230| 2021-02-07T23:11:35.291+0800 I SH_REFR [ConfigServerCatalogCacheLoader-0] Refresh for collection testdb.testcoll to version 1|3||602003276ef388a5caf68456 took 1 ms

The result for both "find doc with secondary pref" and "find doc with secondary pref and local concern" is 1.

 

But when we run the second round (another collection), there is something unexpected.

The result for both "find doc with secondary pref" and "find doc with secondary pref and local concern" is 0, but 1 is expected. And there is no refresh log.

 

I further added some logs to determine the root cause, and I see ( the second round)

find doc with secondary pref: there is no shardVersion compare in CollectionShardingState::_getMetadataWithVersionCheckAt(), that is ok for reading from secondary will using readConcern=available which will not touch router info
find doc with secondary pref and local concern: in CollectionShardingState::_getMetadataWithVersionCheckAt(), the receivedShardVersion is 0|0||000000000000000000000000 , and wantedShardVersion is also 0|0||000000000000000000000000 , that is to say the secondary think the collection(testcoll1) is unsharded. Moreover, the _metadata is empty in MetadataManager::getActiveMetadata()

 

I think the root cause is that secondary only load routeinfo to CatalogCache when the collection is marked as needRefresh, but it will not be marked in this scenario, because there are logical partitions

one is st.s0 & st.rs1.secondary with correct version info, request with shardKey one_doc_id(in the js code) will always be routed to st.rs1 and retrieve the right resp.
another is st.s1 & st.rs0.secondary with shardVersion = unsharded. request with one_doc_id will retrieve nothing for router info is not right
Of course, there is no problem if there are many requests with different shardKeys randomly distributed to mongos. But the scenario described above does exists.

 

Is my guess correct?

And is there any good solution or will this scenario be fixed in the future?



 Comments   
Comment by Xin Wang [ 02/Mar/21 ]

Ok, Thanks for your reply! 

 

I'll follow SERVER-53474

 

Comment by Edwin Zhou [ 01/Mar/21 ]

Hi wangxin201492@gmail.com

I apologize for missing your comment in SERVER-54373. In that ticket, we identified that the behavior described is a duplicate of SERVER-53474, which addresses receiving stale data when reading from a different mongos with readPreference=secondary. The discussion is being tracked on SERVER-53474 and I encourage you to follow along.

It's plausible that the behavior is stemming from the routing table cache, but we can't say for sure that is the only reason for that behavior. That being said, we currently have a successful reproduction on SERVER-53474, and don't need anymore information at this time.

Nevertheless, thank you for investigating this behavior, and we invite this level of depth in any future issues you discover.

Gratefully,
Edwin

Generated at Thu Feb 08 05:34:03 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.