[SERVER-71627] Refreshed cached collection route info will severely block all client request when a cluster with 1 million chunks Created: 26/Nov/22 Updated: 18/Dec/23 Resolved: 11/Jul/23 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | 7.1.0-rc0, 4.2.25, 7.0.1, 6.0.10, 5.0.21, 4.4.25 |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | y yz | Assignee: | Tommaso Tocci |
| Resolution: | Fixed | Votes: | 1 |
| Labels: | balancer-round-perf, chunkmap-improvements | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||
| Assigned Teams: |
Sharding EMEA
|
||||||||||||||||||||||||||||||||||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| Backport Requested: |
v7.0, v6.0, v5.0, v4.4, v4.2
|
||||||||||||||||||||||||||||||||||||||||||||||||||||
| Sprint: | Sharding EMEA 2023-01-09, Sharding EMEA 2023-01-23, Sharding EMEA 2023-02-06, Sharding EMEA 2023-02-20, Sharding EMEA 2023-03-06, Sharding EMEA 2023-03-20, Sharding EMEA 2023-04-03, Sharding EMEA 2023-04-17, Sharding EMEA 2023-05-01, Sharding EMEA 2023-05-15, Sharding EMEA 2023-05-29, Sharding EMEA 2023-06-12, Sharding EMEA 2023-06-26, Sharding EMEA 2023-07-10, Sharding EMEA 2023-07-24 | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||||||||||||||||||||||||||||||
| Case: | (copied to CRM) | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| Linked BF Score: | 5 | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| Description |
|
Refreshing routing info happens under a lot of circumstances on mongos & mongod, e.g. splitting & moving chunks & shard version Check(when routing requests for read/write queries), etc. Efficiency of refreshing is crucial to MongoDB sharded cluster’s core functionalities. |
| Comments |
| Comment by Githook User [ 21/Aug/23 ] | ||||||||||||||||||||||||||||||
|
Author: {'name': 'Tommaso Tocci', 'email': 'tommaso.tocci@mongodb.com', 'username': 'toto-dev'}Message: | ||||||||||||||||||||||||||||||
| Comment by Githook User [ 21/Aug/23 ] | ||||||||||||||||||||||||||||||
|
Author: {'name': 'Tommaso Tocci', 'email': 'tommaso.tocci@mongodb.com', 'username': 'toto-dev'}Message: | ||||||||||||||||||||||||||||||
| Comment by Githook User [ 21/Aug/23 ] | ||||||||||||||||||||||||||||||
|
Author: {'name': 'Tommaso Tocci', 'email': 'tommaso.tocci@mongodb.com', 'username': 'toto-dev'}Message: | ||||||||||||||||||||||||||||||
| Comment by Githook User [ 17/Aug/23 ] | ||||||||||||||||||||||||||||||
|
Author: {'name': 'Tommaso Tocci', 'email': 'tommaso.tocci@mongodb.com', 'username': 'toto-dev'}Message: | ||||||||||||||||||||||||||||||
| Comment by Githook User [ 15/Aug/23 ] | ||||||||||||||||||||||||||||||
|
Author: {'name': 'Tommaso Tocci', 'email': 'tommaso.tocci@mongodb.com', 'username': 'toto-dev'}Message: BACKPORT-16609 | ||||||||||||||||||||||||||||||
| Comment by Githook User [ 11/Jul/23 ] | ||||||||||||||||||||||||||||||
|
Author: {'name': 'Tommaso Tocci', 'email': 'tommaso.tocci@mongodb.com', 'username': 'toto-dev'}Message: Many thanks to yangyazhou, demonyang, ycycyyc, pengzhenyi2015, wujunyuyuyu, lakeleisu and jerrygxx for the original idea on how to significantly improve performance of updating the routing table. | ||||||||||||||||||||||||||||||
| Comment by Tommaso Tocci [ 11/Jul/23 ] | ||||||||||||||||||||||||||||||
|
Hi 1147952115@qq.com ! I would like to thank you, and the entire Tencent MongoDB team, again for the detailed report and the code change proposal. We really appreciate the big effort. We reviewed and tested thoroughly the code you submitted, and we really like the idea of Two-Dimensional Sorting & Search. The performance improvements for incremental refreshes are excellent.
Thus, we ended up changing the code quite a bit to fix the bug and the performance regressions. In particular, we re-wrote entirely the merging algorithm to support all possible chunk operations. | ||||||||||||||||||||||||||||||
| Comment by y yz [ 13/Feb/23 ] | ||||||||||||||||||||||||||||||
|
It's a great honor to get your approval, we have signed the agreement and email to you. thanks. | ||||||||||||||||||||||||||||||
| Comment by Matt Panton [ 09/Feb/23 ] | ||||||||||||||||||||||||||||||
|
Hello 1147952115@qq.com! I am Matt, a Product Manager at MongoDB. After reviewing your pull request, we want to include the optimization in future releases of MongoDB! Before we can include the optimization in a future release, we need you and all of the members of your team that worked on this optimization to sign the Contributor Agreement. You can email the signed agreement(s) to me at matt.panton@mongodb.com After filling out and returning the signed contributor agreement, MongoDB can begin merging the code. Thank you for your contribution! I am thrilled to see the future performance benefits this optimization will deliver to you, your organization, and customers worldwide. | ||||||||||||||||||||||||||||||
| Comment by Chris Kelly [ 26/Nov/22 ] | ||||||||||||||||||||||||||||||
|
Thank you for your incredibly detailed summary with supporting documentation! I'm going to pass this on to the relevant team to look into. | ||||||||||||||||||||||||||||||
| Comment by y yz [ 26/Nov/22 ] | ||||||||||||||||||||||||||||||
| Comment by y yz [ 26/Nov/22 ] | ||||||||||||||||||||||||||||||
|
"MongoDB routing refresh mechanism limitation analysis" and "Proposed MongoDB Incremental Routing Info Refresh Method: Two-Dimensional Sorting & Search" please refer to the attached PDF(MongoDB Routing Info Refresh Optimization.pdf).
The optimization code push: https://github.com/mongodb/mongo/pull/1506
| ||||||||||||||||||||||||||||||
| Comment by y yz [ 26/Nov/22 ] | ||||||||||||||||||||||||||||||
|
Tencent MongoDB team have solved the problem thoroughly by Two-Dimensional Sorting & Search. After optimization, latency consumption remained at 2ms regardless of the data-size of the shrading cluster.
After optimization, refreshing incremental routing info’s time cost is around 2ms, and most of the elapsed time is spent on retrieving changed chunks from the Config Server, while generating the new ChunkMap only takes a very short period (< 1ms) instructions:Here, 5.0 version's shard key is int id(ranging from 0 to 100,000,000), so it take less time. In fact, the shard key in product is more complex, so it take more time than id shard key. The optimization code address: https://github.com/mongodb/mongo/pull/1506
Logs before optimization:
Logs after optimization:
| ||||||||||||||||||||||||||||||
| Comment by y yz [ 26/Nov/22 ] | ||||||||||||||||||||||||||||||
|
I'm terribly sorry,this is an improved feature, but the jira platform changed it as a fault type.
I am from tencent cloud mongodb team, Our company is a partner with yours. In the past several years, Tencent MongoDB team has noticed unusual slow queries on sharded clusters out of blue while there’s no sign of any system resource shortage (CPU, RAM, I/O, etc). Further looking into this symptom, the team figured out retrieving incremental routing info would take a lot of time when total chunk number exceeds certain threshold on shared clusters. For instance, a sharded cluster with 250k chunks requires around 300ms to refresh routing info; For larger clusters, like a cluster with 1 million chunks, it’d take seconds to do the refresh, severely blocking all client queries. Other than that, refreshing routing info could consume more CPU, when a cluster has multiple shard tables, CPU jitter is more obvious when refreshing routing info at the same time. it increased the cost of business development and limits the distributed function Below are some of the cases we found in production & testing environment. 1. BackgroundCase 1: v4.0 Product Cluster with 250k chunks
Data size: 5.5 billion docs, 1.2TB in total size; Chunk number: 250k; Refreshing routing info duration: 200ms for mongos, 300ms for mongod.
Case 2: v4.2 Product Cluster with 1.5m chunks
Data size: 15.5 billion docs,22.5TB in total size; Chunk number: 1.5 million; Refreshing routing info duration: 800ms for mongos, 1.2s for mongod.
Case 3: v3.6 Product Cluster with 4.3m chunks
Data size: 120 billion docs, 80TB data size in total; Chunk number: 430 million; Refreshing routing info duration: 4s for mongos, 4.6s for mongod. Case 4: v5.0 Test Cluster with 2m chunks
Set up a v5.0 sharded cluster with 2 million chunks – Use “id” field as shard key, ranging from 0 to 100,000,000, generate 2m chunks by pre-splitting chunks. instructions:Here, 5.0 version's shard key is int id(ranging from 0 to 100,000,000), so it take less time. In fact, the shard key in product is more complex, so it take more time than the simple id shard key.
//Refreshing routing info {"t": {"$date": "2022-10-06T17: 46: 25.479+08: 00"},"s": "I", "c": "SH_REFR", "id": 4619901, "ctx": "CatalogCache-2","msg": "Refreshed cached collection","attr": {"namespace": "test.test2","lookupSinceVersion": {"0": {"$timestamp": {"t": 49182,"i": 1}},"1": {"$oid": "626a663821072b82d9059209"},"2": {"$timestamp": {"t": 1651140151,"i": 6}}},"newVersion": {"chunkVersion": {"0": {"$timestamp": {"t": 49182,"i": 1}},"1": {"$oid": "626a663821072b82d9059209"},"2": {"$timestamp": {"t": 1651140151,"i": 6}}},"forcedRefreshSequenceNum": 21,"epochDisambiguatingSequenceNum": 18},"timeInStore": {"chunkVersion": : {"0": {"$timestamp": {"t": 49182,"i": 1}},"1": {"$oid": "626a663821072b82d9059209"},"2": {"$timestamp": {"t": 1651140151,"i": 6}},"forcedRefreshSequenceNum": 20,"epochDisambiguatingSequenceNum": 17},"durationMillis": 896}} Case 5: v5.0 Test Cluster with 5m chunks
Set up a v5.0 sharded cluster with 5 million chunks – Use “id” field as shard key, generate 5m chunks by pre-splitting chunks. instructions:Here, 5.0 version's shard key is int id(ranging from 0 to 100,000,000), so it take less time. In fact, the shard key in product is more complex, so it take more time than the simple id shard key.
2. Issue ImpactMore chunks and more complex shardkey,more time it takes. If Refreshing routing info takes too long, the main effects are as follows:
All requests will be blocked when mongos/mongod is retrieving incremental routing info, henceforth the bigger the cluster is, the more un-responsive it could be.
A sharded cluster with 1.4 million chunks had to disable balancer except for several hours during midnight(The client request qps is very low, CPU & IO & MEM are very low load), due to the frequent slow queries caused by refreshing routing info. However the balancing progress made during midnights never caught up the gap and the shards imbalance ends up like below, and still worsening:
After analysis, the cluster jitter is caused by route refreshing, the system load is low during this period.
In order to avoid serious service jitter caused by routing problems, many users strictly limit the data amount of a collection(data of a single collection should not exceed 4T). When the amount of data in a collection exceeds 4T, the user needs to separate the collection. In this way, we lose the core distributed advantage that we have.
Multiple ChunkVector iterator, a lot of CPU resources are used, Especially if there's a lot of collection and a lot of chunks.
|