[SERVER-54373] read from mongos with readPref=secondary and readConcern=local/majority return empty data Created: 07/Feb/21  Updated: 18/Jun/21  Resolved: 09/Feb/21

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 4.2.1
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: vinllen chen Assignee: Edwin Zhou
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File image-2021-02-07-23-46-36-583.png    
Issue Links:
Duplicate
duplicates SERVER-53474 Cannot read from another mongos with ... Closed
Operating System: ALL
Participants:
Case:

 Description   

I can't read data from mongos that was just inserted through another mongos.

The test is very simple, I have two mongos:

  1. connect mongos1
  2. shard Collection with hash index
  3. insert data
  4. query in mongos2 with readPref=secondary and readConcern=local/majority, no data returned.
  5. If I read from primary to trigger the cache route refresh, then data can be read.

 

This only happens in MongoDB 4.2 version with hash index key shardCollection method. If shard key is a range, everything works ok. If I shard Collection with hash index after data inserted, also everything is ok.

mongos> db.y.find({"_id" : ObjectId("601faa86dc371c32b4f58f6b")}).readPref("secondaryPreferred")
mongos> db.y.find({"_id" : ObjectId("601faa86dc371c32b4f58f6b")}).readPref("secondaryPreferred").readConcern("local")
mongos> db.y.find({"_id" : ObjectId("601faa86dc371c32b4f58f6b")}).readPref("secondaryPreferred").readConcern("majority")
mongos>
mongos> db.y.find({"_id" : ObjectId("601faa86dc371c32b4f58f6b")})
{ "_id" : ObjectId("601faa86dc371c32b4f58f6b"), "y" : 1 }
mongos> db.y.find({"_id" : ObjectId("601faa86dc371c32b4f58f6b")}).readPref("secondaryPreferred").readConcern("majority")
{ "_id" : ObjectId("601faa86dc371c32b4f58f6b"), "y" : 1 }
mongos>
mongos>
mongos> db.z.find({"_id" : ObjectId("601faae5dc371c32b4f58f6e")}).readPref("secondaryPreferred")
mongos> db.z.find({"_id" : ObjectId("601faae5dc371c32b4f58f6e")}).readPref("secondaryPreferred").readConcern("majority")
mongos>
mongos>

 



 Comments   
Comment by Edwin Zhou [ 09/Feb/21 ]

Hi cvinllen@gmail.com,

Thank you for your ticket. This appears to be a duplicate of SERVER-53474, which describes inconsistent data being returned when reading from a second mongos with readPreference=secondary. The data is updated once we read from a primary. I'll close this ticket as such, and you can track the investigation in that ticket.

Best,
Edwin

Comment by Xin Wang [ 07/Feb/21 ]

The same question with @vinllen chen

And I supply the reproduce step with js code here:

var st = new ShardingTest({
    shards: 2, mongos: 2, other: {
        rs: true,
    }
});
 
var adminDB = st.s0.getDB('admin');
assert.commandWorked(adminDB.runCommand({enableSharding: "testdb", primaryShard: st.rs0.name}));
jsTest.log("=============== first round");
runTest("testcoll");
jsTest.log("=============== second round");
runTest("testcoll1");
 
function runTest(testCollection){
    assert.commandWorked(adminDB.runCommand({shardCollection: "testdb."+testCollection, key: {_id: "hashed"}}));
    var testDB = st.s0.getDB('testdb');
    for (var i = 0 ; i < 10 ; i ++ ){
        testDB.getCollection(testCollection).insert({name:i});
    }
    printjson(st.rs0.getPrimary().getDB('testdb').getCollection(testCollection).find().toArray());
    jsTest.log("Above are the documents in " + st.rs0.name + " (primaryShard)");
    printjson(st.rs1.getPrimary().getDB('testdb').getCollection(testCollection).find().toArray());
    jsTest.log("Above are the documents in " + st.rs1.name);
 
    jsTest.log("=============== database info");
    var dbDoc = st.config.databases.findOne({_id: "testdb"});
    printjson(dbDoc);
 
    jsTest.log("=============== choose one document not in primary shard");
    var one_doc = st.rs1.getPrimary().getDB('testdb').getCollection(testCollection).findOne();
    var one_doc_id = one_doc["_id"];
    printjson(one_doc);
    jsTest.log(one_doc_id);
 
    jsTest.log("=============== find doc with secondary pref");
    jsTest.log("result: " + st.s1.getDB("testdb").getCollection(testCollection).find({_id: one_doc_id}).readPref("secondary").toArray().length);
    jsTest.log("=============== find doc with secondary pref done");
 
    jsTest.log("=============== find doc with secondary pref and local concern");
    jsTest.log("result: " + st.s1.getDB("testdb").getCollection(testCollection).find({_id: one_doc_id}).readPref("secondary").readConcern("local").toArray().length);
    jsTest.log("=============== find doc with secondary pref and local concern done");
 
}
 
st.stop();

 

With the first round done , everything runs ok.  And when we run `.find({_id: one_doc_id}).readPref("secondary")` through st.s1, we see the refresh log:

s21230| 2021-02-07T23:11:35.291+0800 I SH_REFR [ConfigServerCatalogCacheLoader-0] Refresh for collection testdb.testcoll to version 1|3||602003276ef388a5caf68456 took 1 ms

The result for both "find doc with secondary pref" and "find doc with secondary pref and local concern" is 1.

 

But when we run the second round (another collection), there is something unexpected.

The result for both "find doc with secondary pref" and "find doc with secondary pref and local concern" is 0, but 1 is expected. And there is no refresh log.

 

I further added some logs to determine the root cause, and I see ( the second round)

  • find doc with secondary pref: there is no shardVersion compare in CollectionShardingState::_getMetadataWithVersionCheckAt(), that is ok for reading from secondary will using readConcern=available which will not touch router info
  • find doc with secondary pref and local concern: in CollectionShardingState::_getMetadataWithVersionCheckAt(), the receivedShardVersion is 0|0||000000000000000000000000 , and wantedShardVersion is also 0|0||000000000000000000000000 , that is to say the secondary think the collection(testcoll1) is unsharded. Moreover, the _metadata is empty in MetadataManager::getActiveMetadata()

 

I think the root cause is that secondary only load routeinfo to CatalogCache when the collection is marked as needRefresh, but it will not be marked in this scenario, because there are logical partitions

  • one is st.s0 & st.rs1.secondary with correct version info, request with shardKey one_doc_id(in the js code) will always be routed to st.rs1 and retrieve the right resp.
  • another is st.s1 & st.rs0.secondary with shardVersion = unsharded. request with one_doc_id will retrieve nothing for router info is not right

Of course, there is no problem if there are many requests with different shardKeys randomly distributed to mongos. But the scenario described above does exists.

 

Is my guess correct?

And is there any good solution or will this scenario be fixed in the future?

 

 

 
 

Generated at Thu Feb 08 05:33:21 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.