Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-54625

Maybe the root case for reading from secondary return empty even readConcern=local/majority



    • Type: Bug
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Duplicate
    • Affects Version/s: 4.2.1
    • Fix Version/s: None
    • Component/s: Sharding
    • Labels:
    • Operating System:
    • Case:


      As I see there are some issues reported that reading from

      secondary return empty even readConcern=local/majority, I think the root cause may be that the secondary does not load routeTable to their CatalogCache.


      I comment same thing under https://jira.mongodb.org/browse/SERVER-54373 , but there is no reply for me. I want to know if I'm wrong?


      I supply the reproduce step with js code here:

      var st = new ShardingTest({
       shards: 2, mongos: 2, other: {
       rs: true,
      var adminDB = st.s0.getDB('admin');
      assert.commandWorked(adminDB.runCommand({enableSharding: "testdb", primaryShard: st.rs0.name}));
      jsTest.log("=============== first round");
      jsTest.log("=============== second round");
      function runTest(testCollection){
       assert.commandWorked(adminDB.runCommand({shardCollection: "testdb."+testCollection, key: {_id: "hashed"}}));
       var testDB = st.s0.getDB('testdb');
       for (var i = 0 ; i < 10 ; i ++ ){
       jsTest.log("Above are the documents in " + st.rs0.name + " (primaryShard)");
       jsTest.log("Above are the documents in " + st.rs1.name);
       jsTest.log("=============== database info");
       var dbDoc = st.config.databases.findOne({_id: "testdb"});
       jsTest.log("=============== choose one document not in primary shard");
       var one_doc = st.rs1.getPrimary().getDB('testdb').getCollection(testCollection).findOne();
       var one_doc_id = one_doc["_id"];
       jsTest.log("=============== find doc with secondary pref");
       jsTest.log("result: " + st.s1.getDB("testdb").getCollection(testCollection).find({_id: one_doc_id}).readPref("secondary").toArray().length);
       jsTest.log("=============== find doc with secondary pref done");
       jsTest.log("=============== find doc with secondary pref and local concern");
       jsTest.log("result: " + st.s1.getDB("testdb").getCollection(testCollection).find({_id: one_doc_id}).readPref("secondary").readConcern("local").toArray().length);
       jsTest.log("=============== find doc with secondary pref and local concern done");


      With the first round done , everything runs ok. And when we run `.find({_id: one_doc_id}).readPref("secondary")` through st.s1, we see the refresh log:

      s21230| 2021-02-07T23:11:35.291+0800 I SH_REFR [ConfigServerCatalogCacheLoader-0] Refresh for collection testdb.testcoll to version 1|3||602003276ef388a5caf68456 took 1 ms

      The result for both "find doc with secondary pref" and "find doc with secondary pref and local concern" is 1.


      But when we run the second round (another collection), there is something unexpected.

      The result for both "find doc with secondary pref" and "find doc with secondary pref and local concern" is 0, but 1 is expected. And there is no refresh log.


      I further added some logs to determine the root cause, and I see ( the second round)

      find doc with secondary pref: there is no shardVersion compare in CollectionShardingState::_getMetadataWithVersionCheckAt(), that is ok for reading from secondary will using readConcern=available which will not touch router info
      find doc with secondary pref and local concern: in CollectionShardingState::_getMetadataWithVersionCheckAt(), the receivedShardVersion is 0|0||000000000000000000000000 , and wantedShardVersion is also 0|0||000000000000000000000000 , that is to say the secondary think the collection(testcoll1) is unsharded. Moreover, the _metadata is empty in MetadataManager::getActiveMetadata()


      I think the root cause is that secondary only load routeinfo to CatalogCache when the collection is marked as needRefresh, but it will not be marked in this scenario, because there are logical partitions

      one is st.s0 & st.rs1.secondary with correct version info, request with shardKey one_doc_id(in the js code) will always be routed to st.rs1 and retrieve the right resp.
      another is st.s1 & st.rs0.secondary with shardVersion = unsharded. request with one_doc_id will retrieve nothing for router info is not right
      Of course, there is no problem if there are many requests with different shardKeys randomly distributed to mongos. But the scenario described above does exists.


      Is my guess correct?

      And is there any good solution or will this scenario be fixed in the future?


          Issue Links



              edwin.zhou Edwin Zhou
              wangxin201492@gmail.com Xin Wang
              0 Vote for this issue
              6 Start watching this issue