Uploaded image for project: 'Documentation'
  1. Documentation
  2. DOCS-13774

Investigate changes in SERVER-49233: Introduce a flag to toggle the logic for bumping collection's major version during split


      Downstream Change Summary

      This introduces a new server parameter incrementChunkMajorVersionOnChunkSplits into 3.6 and 4.0 that affects the frequency of sharding metadata refreshes caused by autosplitting. Its default value is false and should only be modified in rare cases (see the ticket description for more details). The flag effectively makes the behavior introduced by SERVER-41480 opt-in, so customers who previously had benefited from SERVER-41480 will want to set the flag to true.

      Description of Linked Ticket

      Issue Status as of July 12 2020


      This fix introduces a global flag named incrementChunkMajorVersionOnChunkSplits which is used to choose between two algorithms for updating a collection’s routing table after performing a chunk split in a sharded cluster. Chunk split operations and their corresponding changes to a collection’s routing table causes mongoses to refresh their routing table caches, requiring resources and briefly stalling CRUD operations, which can increase CPU load and operation latency.

      Before 3.6.15 and 4.0.13, the algorithm worked well for shard keys with uniformly distributed values, but led to performance problems for cases where shard key values are incrementing monotonically. A change introduced by SERVER-41480 changed the algorithm to fix that edge case.

      The changes introduced with this ticket cause the old algorithm to be used by default, but allows the customer to opt into the behavior introduced by SERVER-41480 if needed.


      Before 3.6.15 and 4.0.13, the mongos’s cached version of the routing table for a collection is not updated until a chunk is moved or until the mongos succeeds in splitting a chunk. This means that, during inserts to a sharded collection with monotonically increasing shard key values, if the chunk with the max shard key is split by one mongos, other mongoses may try and fail to split this chunk because of out-of-date routing tables in their caches. These continual failed split attempts can result in a high CPU load on the shard containing this chunk.

      In versions 3.6.15 through 3.6.18, 4.0.13 through 4.0.19 and 4.2.2 through 4.2.8 during inserts to documents with uniformly distributed shard key values, due to the change made by SERVER-41480, mongoses are forced to update their routing table caches more often than necessary, causing CRUD operations to be blocked while routing table caches are being updated.

      From 4.2 onwards, the splitting is triggered by shards, meaning that the problem that SERVER-41480 was designed to address is no longer present. However, the changes it introduced can still cause routing table caches to be updated more often than necessary.


      For 3.6 and 4.0 installations, upgrade to the latest minor version which supports the flag in mongod. If your deployment inserts documents with monotonically increasing or decreasing shard keys and you observe high load due to failed chunk split attempts, set the flag to “true”, otherwise no additional action is required. The flag can be set to true in the mongod configuration file, or with
      --setParameter incrementChunkMajorVersionOnChunkSplit=true
      on the command line when starting mongod.

      For 4.2 installations, upgrade to 4.2.9. No further action is needed.


      This issue affects versions 3.6.15 through 3.6.18, 4.0.13 through 4.0.19 and 4.2.2 through 4.2.8.


      The flag is available in versions 3.6.19 and 4.0.20. In 4.2.9 both uniform and monotonic shard key distributions are supported and so no flag is necessary.

      Original Description

      This ticket is to introduce a flag, which toggles the logic for bumping the collection's major version during split.

      The flag should only be meaningful for the Config Server and should default to OFF (meaning the major version should not be bumped by default).

      This flag should be present in 3.6, 4.0 and 4.2. For 4.4, since the auto-splitter is running on the shards, this flag should be removed, along with the aforementioned logic for bumping the major version.

      As part of a fix for a high CPU utilisation by the auto-splitter when writes are happening to the extreme chunks, we chose a solution, which erred on the side of correctness, with the reasoning that on most systems auto-splits are happening rarely and are not happening at the same time across all shards.

      Under this solution, the collection's major version would be incremented only if a split happens at the shard, whose shardVersion == collectionVersion. Therefore, unless a split happens on this shard, all the other shards will just catch-up to its shard version and will then stop. This will repeat when the shard containing the collection version performs a split again, so the idea was that this will be amortised across the number of shards.

      However, this logic seems to be causing more harm than good in the case of almost uniform writes across all chunks. If it is the case that all shards are doing splits almost in unison, under this fix there will constantly be a bump in the collection version, which means constant stalls due to StaleShardVersion.

      Scope of changes

      Impact to Other Docs

      MVP (Work and Date)

      Resources (Scope or Design Docs, Invision, etc.)

            Unassigned Unassigned
            backlog-server-pm Backlog - Core Eng Program Management Team
            0 Vote for this issue
            5 Start watching this issue

              1 year, 33 weeks ago