[DOCS-13774] Investigate changes in SERVER-49233: Introduce a flag to toggle the logic for bumping collection's major version during split Created: 20/Jul/20 Updated: 13/Nov/23 |
|
| Status: | Closed |
| Project: | Documentation |
| Component/s: | manual, Server |
| Affects Version/s: | None |
| Fix Version/s: | 3.6.19, 4.0.20, 4.2.9, Server_Docs_20231030, Server_Docs_20231106, Server_Docs_20231105, Server_Docs_20231113 |
| Type: | Task | Priority: | Major - P3 |
| Reporter: | Backlog - Core Eng Program Management Team | Assignee: | Unassigned |
| Resolution: | Won't Do | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Participants: | |||||||||
| Days since reply: | 1 year, 14 weeks, 2 days ago | ||||||||
| Epic Link: | DOCSP-11345 | ||||||||
| Description |
DescriptionDownstream Change Summary This introduces a new server parameter incrementChunkMajorVersionOnChunkSplits into 3.6 and 4.0 that affects the frequency of sharding metadata refreshes caused by autosplitting. Its default value is false and should only be modified in rare cases (see the ticket description for more details). The flag effectively makes the behavior introduced by Description of Linked TicketIssue Status as of July 12 2020 ISSUE SUMMARY This fix introduces a global flag named incrementChunkMajorVersionOnChunkSplits which is used to choose between two algorithms for updating a collection’s routing table after performing a chunk split in a sharded cluster. Chunk split operations and their corresponding changes to a collection’s routing table causes mongoses to refresh their routing table caches, requiring resources and briefly stalling CRUD operations, which can increase CPU load and operation latency. Before 3.6.15 and 4.0.13, the algorithm worked well for shard keys with uniformly distributed values, but led to performance problems for cases where shard key values are incrementing monotonically. A change introduced by The changes introduced with this ticket cause the old algorithm to be used by default, but allows the customer to opt into the behavior introduced by USER IMPACT Before 3.6.15 and 4.0.13, the mongos’s cached version of the routing table for a collection is not updated until a chunk is moved or until the mongos succeeds in splitting a chunk. This means that, during inserts to a sharded collection with monotonically increasing shard key values, if the chunk with the max shard key is split by one mongos, other mongoses may try and fail to split this chunk because of out-of-date routing tables in their caches. These continual failed split attempts can result in a high CPU load on the shard containing this chunk. In versions 3.6.15 through 3.6.18, 4.0.13 through 4.0.19 and 4.2.2 through 4.2.8 during inserts to documents with uniformly distributed shard key values, due to the change made by From 4.2 onwards, the splitting is triggered by shards, meaning that the problem that SUGGESTED ACTION For 3.6 and 4.0 installations, upgrade to the latest minor version which supports the flag in mongod. If your deployment inserts documents with monotonically increasing or decreasing shard keys and you observe high load due to failed chunk split attempts, set the flag to “true”, otherwise no additional action is required. The flag can be set to true in the mongod configuration file, or with For 4.2 installations, upgrade to 4.2.9. No further action is needed. AFFECTED VERSIONS This issue affects versions 3.6.15 through 3.6.18, 4.0.13 through 4.0.19 and 4.2.2 through 4.2.8. FIX VERSION The flag is available in versions 3.6.19 and 4.0.20. In 4.2.9 both uniform and monotonic shard key distributions are supported and so no flag is necessary. Original DescriptionThis ticket is to introduce a flag, which toggles the logic for bumping the collection's major version during split. The flag should only be meaningful for the Config Server and should default to OFF (meaning the major version should not be bumped by default). This flag should be present in 3.6, 4.0 and 4.2. For 4.4, since the auto-splitter is running on the shards, this flag should be removed, along with the aforementioned logic for bumping the major version. Tl;dr: Under this solution, the collection's major version would be incremented only if a split happens at the shard, whose shardVersion == collectionVersion. Therefore, unless a split happens on this shard, all the other shards will just catch-up to its shard version and will then stop. This will repeat when the shard containing the collection version performs a split again, so the idea was that this will be amortised across the number of shards. However, this logic seems to be causing more harm than good in the case of almost uniform writes across all chunks. If it is the case that all shards are doing splits almost in unison, under this fix there will constantly be a bump in the collection version, which means constant stalls due to StaleShardVersion. Scope of changesImpact to Other DocsMVP (Work and Date)Resources (Scope or Design Docs, Invision, etc.) |
| Comments |
| Comment by Education Bot [ 31/Oct/22 ] |
|
Hello! This ticket has been closed due to inactivity. If you believe this ticket is still important, please reopen it and leave a comment to explain why. Thank you! |