[SERVER-59345] Performance degradation after mongo sharded cluster upgrade from 4.2 to 4.4 Created: 14/Aug/21  Updated: 12/May/22  Resolved: 07/Sep/21

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Stephen Paul Adithela Assignee: Edwin Zhou
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Operating System: ALL
Participants:

 Description   

Hi All, We upgraded our sharded cluster a few days back from 4.2 to 4.4, After upgrading to 4.4, our queries are taking much longer and performance continues to deteriorate. Our sharded cluster configuration:
Instance type: M5.xlarge (data nodes), t3.medium (Config servers)

Shards: 9

Replica set type: PSA

Config Servers: 3

Mongos's: 21

 

One thing we noticed is when there are less number of mongos's, the performance is much better, but enabling more mongos's leading to performance deterioration. 

 

We did notice this issue has some similarity to https://jira.mongodb.org/browse/SERVER-51104

 

Thanks for your help in advance



 Comments   
Comment by Ilan M [ 12/May/22 ]

Thank you Stephen for the quick response. Luckily we were able to identify the root cause quick is issue was causing because of hedge reads, had to disable them.

 

https://www.mongodb.com/docs/manual/core/sharded-cluster-query-router/#std-label-mongos-hedged-reads

Comment by Stephen Paul Adithela [ 12/May/22 ]

Hello, In our case the issue has nothing to do with the mongo upgrade as we initially thought. It was due to some degradation on a particular shard, it can create a bottleneck and the performance of the whole DB could go down. We restarted problematic nodes on new hosts (increased WTSE cache+resources temporarily) and that stabilized the env.

Comment by Ilan M [ 12/May/22 ]

@stephen_paul_adithela - Could you help us know what was the fix performed ?. It would benefit others. Having similar issue where after upgrade memory usage gone down and performance has degraded. 

Appreciate the help on this.

Comment by Edwin Zhou [ 07/Sep/21 ]

Hi stephenpaul2727@gmail.com,

Thank you for following up that you were able to resolve this issue. We have had previous tickets (SERVER-57249) that identified a regression between 4.2 and 4.4 without resolve, so we remain interested in the behavior that caused this regression, and the details of the investigation that took place to help resolve this issue.

Kind regards,
Edwin

Comment by Stephen Paul Adithela [ 03/Sep/21 ]

Hi Edwin, Sorry for the late update. We had a consultant from mongo look into our production systems last week. The performance degradation and issues we noticed are due to bottlenecks with wired tiger cache on specific shards. Mongo sharded cluster with 4.4 currently is running fine.

 

Please resolve this ticket and thanks

Comment by Edwin Zhou [ 03/Sep/21 ]

Hi stephenpaul2727@gmail.com,

We still need additional information to diagnose the problem. If this is still an issue for you, would you please archive (tar or zip) the $dbpath/diagnostic.data directory (the contents are described here) and attach it to this ticket?

Best,
Edwin

Comment by Edwin Zhou [ 17/Aug/21 ]

Hi stephenpaul2727@gmail.com,

Thanks for your report.

Would you please archive (tar or zip) the $dbpath/diagnostic.data directory (the contents are described here) and attach it to this ticket?

Can you also let us know the date, time and timezone of when this upgrade was performed?
Best,
Edwin

Generated at Thu Feb 08 05:47:01 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.