[GODRIVER-2629] Connect to multiple mongos in a imbalancing way Created: 29/Oct/22  Updated: 27/Oct/23  Resolved: 30/Nov/22

Status: Closed
Project: Go Driver
Component/s: None
Affects Version/s: 1.9.1
Fix Version/s: None

Type: Bug Priority: Unknown
Reporter: Jay Chung Assignee: Benji Rewis (Inactive)
Resolution: Gone away Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File 截圖 2022-10-29 上午4.56.15.png    

 Description   

Summary

Connect to multiple mongos in a imbalancing way.

Let's say I have 3 mongos, a:27017, b:27017 and c:27017. Among them, a:27017 always has the highest CPU utilization since it's first host in the connstring.

Please provide the version of the driver. If applicable, please provide the MongoDB server version and topology (standalone, replica set, or sharded cluster).

  • go.mongodb.org/mongo-driver v1.9.1
  • sharded cluster

How to Reproduce

Steps to reproduce. If possible, please include a Short, Self Contained, Correct (Compilable), Example.

This is how I connect to MongoDB.

package main
 
import (
	"context"
 
	"go.mongodb.org/mongo-driver/mongo"
	"go.mongodb.org/mongo-driver/mongo/options"
	"go.mongodb.org/mongo-driver/mongo/readpref"
	"go.mongodb.org/mongo-driver/x/mongo/driver/connstring"
)
 
const url = "mongodb://user:passwd@a:27017,b:b:27017,c:27017/db?maxPoolSize=20"
 
func main() {
	cs, err := connstring.ParseAndValidate(url)
	if err != nil {
		panic(err)
	}
 
	mode, _ := readpref.ModeFromString(cs.ReadPreference)
	readPref, err := readpref.New(mode)
	if err != nil {
		panic(err)
	}
 
	option := options.Client().
		ApplyURI(url).
		SetAppName(cs.AppName).
		SetReadPreference(readPref).
		SetMaxPoolSize(cs.MaxPoolSize)
 
	client, err := mongo.Connect(context.Background(), option)
	if err != nil {
		panic(err)
	}
}

Additional Background

Please provide any additional background information that may be helpful in diagnosing the bug.

These mongos are StatefulSets in K8s, 6 equal pods.
 



 Comments   
Comment by PM Bot [ 30/Nov/22 ]

There hasn't been any recent activity on this ticket, so we're resolving it. Thanks for reaching out! Please feel free to comment on this if you're able to provide more information.

Comment by Benji Rewis (Inactive) [ 10/Nov/22 ]

ken8203@gmail.com, thank you for checking! Let me know if I can help at all.

Besides, is it a good idea to decide candidate by latency? In this situation, it becomes imbalance.

Using latency for server selection is a behavior defined by our cross-drivers specification; we do this because we assume that "all else being equal, faster responses to queries and writes are preferable".

To be clear, we do not always select the server with the lowest latency. We merely filter the candidates based on the latency window. If you're finding that some hosts are often being excluded from server selection, you may be able to increase your LocalThreshold on your Client to include servers with higher relative latency. See the documentation for SetLocalThreshold (I believe the default is 15ms).

Comment by Jay Chung [ 07/Nov/22 ]

Thanks for your effort. These mongos are hosted in K8s as a StatefulSet. My assumption is that their latency to clients should be pretty close. But still, I will subscribe ServerDescriptionChangedEvent and access AverageRTT to figure out the root cause.

 

Besides, is it a good idea to decide candidate by latency? In this situation, it becomes imbalance.

Comment by Benji Rewis (Inactive) [ 07/Nov/22 ]

I've been unable to recreate this behavior, so I think it might have to do with the latencies of your hosts (a:27017 might be selected more often and therefore have higher CPU utilization as it has lower latency). For now, I'd recommend subscribing to ServerDescriptionChangedEvent, and using that to access AverageRTT. You can compare the AverageRTT values across hosts to see if that theory is correct.

Comment by Benji Rewis (Inactive) [ 04/Nov/22 ]

I'm working on replicating this issue locally, but another thought:

It looks like you're not specifying a read preference in your connection string. We use a primary read preference by default and select servers based on their latency. It's possible that the a:27017 host simply has the lowest latency and is therefore chosen far more frequently than the other two hosts.

Comment by Benji Rewis (Inactive) [ 01/Nov/22 ]

Hello ken8203@gmail.com ! Thanks for your bug report; this seems like unexpected behavior, and we're taking a look now. Is this a problem you started having when you updated to v1.9.1 from a pre v1.9.0 version?

Generated at Thu Feb 08 08:39:03 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.