Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Unresolved
Priority: Major - P3
Fix Version/s: None
Affects Version/s: 4.4.30, 6.0.15, 7.0.8, 5.0.27
Component/s: None
Labels:
- sa-backlog

Assigned Teams:

Server Programmability
Operating System:
ALL
Steps To Reproduce:
Hide

a single-shard cluster, with one primary and two secondary, and a script query secondary

Set secondary 1 hidden = true ， a few seconds later，set secondary 1 hidden = false.

Restart node 1

Set secondary 1 and secondary 2 both hidden = true and a few seconds later revert to hidden = false ;

Restart node 1
Show
a single-shard cluster, with one primary and two secondary, and a script query secondary Set secondary 1 hidden = true ， a few seconds later，set secondary 1 hidden = false. Restart node 1 Set secondary 1 and secondary 2 both hidden = true and a few seconds later revert to hidden = false ; Restart node 1
Sprint:
Service Arch 2024-04-01, Service Arch 2024-04-15
Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

We discovered a strange phenomenon. After in-depth research, we found that it was a bug in the implementation of ScanningReplicaSetMonitor.

First, prepare a single-shard cluster, with one primary and two secondary, and a script query secondary, like this
```

import pymongo
from pymongo import MongoClient
import time
c = MongoClient("mongodb://xxxxx/admin?readPreference=secondaryPreferred")
while True:
    for _ in c.db.coll。find():
        pass

```
Then , let’s look at a series of common operations and the phenomena behind them.
* Set secondary 1 hidden = true ， a few seconds later，set secondary 1 hidden = false.

at this time , I will find that only node 2 has query operation. And node 1 have a large replicaSetPingTimesMillis in

mongos>db.adminCommand("getDiagnosticData").data.connPoolStats.replicaSetPingTimesMillis
{
        "mongo109" : 
                "x1:27017" : 2.459,    
                "x2:27017" : 2.289,   
                "x3:27017" : 9223372036854776  
        },
}

Restart node 1

Then everything will return to normal, queries are distributed normally, and replicaSetPingTimesMillis is normal.

Set secondary 1 and secondary 2 both hidden = true and a few seconds later revert to hidden = false ;

at this time , I will find queries are distributed normally but all secondary replicaSetPingTimesMillis is large ;
* After restart node 1 ,onle secondary 1 has query operation.

The key reason behind the above phenomenon is that : * ServerPingMonitor::onTopologyDescriptionChangedEvent just remove monitors that are missing from the topology ； but don't add new monitors;

int struct LatencyWindow , Due to the following code, there will be a Window (max(),max())

upper = (lowerBound == HelloRTT::max()) ? lowerBound : lowerBound + windowWidth;

I think ServerPingMonitor is a bug, LatencyWindow is a feature

is related to

DRIVERS-2899 Verify drivers can target newly-unhidden nodes

Backlog

SERVER-62079 remove scanning RSM

Closed

Assignee:: Unassigned
Reporter:: FirstName lipengchong
Participants:: FirstName lipengchong, Jason Chan
Votes:: 0 Vote for this issue
Watchers:: 8 Start watching this issue

Created:: Mar 22 2024 08:01:10 AM UTC
Updated:: Oct 23 2024 03:47:48 PM UTC

Details

Description

Attachments

Issue Links

Activity

People

Dates