Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-88399

A normal secondary with hidden = false couldn't receive query with readPreference = secondaryPreferred

    • Type: Icon: Bug Bug
    • Resolution: Unresolved
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 4.4.30, 6.0.15, 7.0.8, 5.0.27
    • Component/s: None
    • Labels:
      None
    • Service Arch
    • ALL
    • Hide

        a single-shard cluster, with one primary and two secondary, and a script query secondary

      • Set secondary 1 hidden = true , a few seconds later,set secondary 1 hidden = false. 
      • Restart node 1
      • Set secondary 1 and secondary 2 both hidden = true and a few seconds later revert to hidden = false ;
      • Restart node 1
      Show
        a single-shard cluster, with one primary and two secondary, and a script query secondary Set secondary 1 hidden = true , a few seconds later,set secondary 1 hidden = false.  Restart node 1 Set secondary 1 and secondary 2 both hidden = true and a few seconds later revert to hidden = false ; Restart node 1
    • Service Arch 2024-04-01, Service Arch 2024-04-15

      We discovered a strange phenomenon. After in-depth research, we found that it was a bug in the implementation of ScanningReplicaSetMonitor.
       
      First, prepare a single-shard cluster, with one primary and two secondary, and a script query secondary, like this
      ```

      import pymongo
      from pymongo import MongoClient
      import time
      c = MongoClient("mongodb://xxxxx/admin?readPreference=secondaryPreferred")
      while True:
          for _ in c.db.coll。find():
              pass

      ```
      Then , let’s look at a series of common operations and the phenomena behind them.
        *  Set secondary 1 hidden = true , a few seconds later,set secondary 1 hidden = false. 

      at this time , I will find that only node 2 has query operation. And node 1 have a large replicaSetPingTimesMillis in 

      mongos>db.adminCommand("getDiagnosticData").data.connPoolStats.replicaSetPingTimesMillis
      {
              "mongo109" : 
                      "x1:27017" : 2.459,    
                      "x2:27017" : 2.289,   
                      "x3:27017" : 9223372036854776  
              },
      }
      • Restart node 1

      Then everything will return to normal, queries are distributed normally, and replicaSetPingTimesMillis is normal.

      • Set secondary 1 and secondary 2 both hidden = true and a few seconds later revert to hidden = false ;

      at this time , I will find queries are distributed normally but all secondary replicaSetPingTimesMillis is large ;
        * After restart node 1 ,onle secondary 1 has query operation.

       
       
      The key reason behind the above phenomenon is that : * ServerPingMonitor::onTopologyDescriptionChangedEvent just remove monitors that are missing from the topology ; but don't add new monitors;

      • int struct LatencyWindow , Due to the following code, there will be a Window (max(),max())
      upper = (lowerBound == HelloRTT::max()) ? lowerBound : lowerBound + windowWidth; 

       
       
       
      I think ServerPingMonitor is a bug, LatencyWindow is a feature

            Assignee:
            backlog-server-servicearch [DO NOT USE] Backlog - Service Architecture
            Reporter:
            lpc FirstName lipengchong
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

              Created:
              Updated: