Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-49233

Introduce a flag to toggle the logic for bumping collection's major version during split

    • Type: Icon: New Feature New Feature
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • 4.0.20, 3.6.19, 4.2.9
    • Affects Version/s: None
    • Component/s: Sharding
    • Labels:
      None
    • Minor Change
    • v4.2, v4.0, v3.6
    • Sharding 2020-07-13, Sharding 2020-07-27
    • 41

      Issue Status as of July 12 2020

      ISSUE SUMMARY

      This fix introduces a global flag named incrementChunkMajorVersionOnChunkSplits which is used to choose between two algorithms for updating a collection’s routing table after performing a chunk split in a sharded cluster. Chunk split operations and their corresponding changes to a collection’s routing table causes mongoses to refresh their routing table caches, requiring resources and briefly stalling CRUD operations, which can increase CPU load and operation latency.

      Before 3.6.15 and 4.0.13, the algorithm worked well for shard keys with uniformly distributed values, but led to performance problems for cases where shard key values are incrementing monotonically. A change introduced by SERVER-41480 changed the algorithm to fix that edge case.

      The changes introduced with this ticket cause the old algorithm to be used by default, but allows the customer to opt into the behavior introduced by SERVER-41480 if needed.

      USER IMPACT

      Before 3.6.15 and 4.0.13, the mongos’s cached version of the routing table for a collection is not updated until a chunk is moved or until the mongos succeeds in splitting a chunk. This means that, during inserts to a sharded collection with monotonically increasing shard key values, if the chunk with the max shard key is split by one mongos, other mongoses may try and fail to split this chunk because of out-of-date routing tables in their caches. These continual failed split attempts can result in a high CPU load on the shard containing this chunk.

      In versions 3.6.15 through 3.6.18, 4.0.13 through 4.0.19 and 4.2.2 through 4.2.8 during inserts to documents with uniformly distributed shard key values, due to the change made by SERVER-41480, mongoses are forced to update their routing table caches more often than necessary, causing CRUD operations to be blocked while routing table caches are being updated.

      From 4.2 onwards, the splitting is triggered by shards, meaning that the problem that SERVER-41480 was designed to address is no longer present. However, the changes it introduced can still cause routing table caches to be updated more often than necessary.

      SUGGESTED ACTION

      For 3.6 and 4.0 installations, upgrade to the latest minor version which supports the flag in mongod. If your deployment inserts documents with monotonically increasing or decreasing shard keys and you observe high load due to failed chunk split attempts, set the flag to “true”, otherwise no additional action is required. The flag can be set to true in the mongod configuration file, or with
      --setParameter incrementChunkMajorVersionOnChunkSplit=true
      on the command line when starting mongod.

      For 4.2 installations, upgrade to 4.2.9. No further action is needed.

      AFFECTED VERSIONS

      This issue affects versions 3.6.15 through 3.6.18, 4.0.13 through 4.0.19 and 4.2.2 through 4.2.8.

      FIX VERSION

      The flag is available in versions 3.6.19 and 4.0.20. In 4.2.9 both uniform and monotonic shard key distributions are supported and so no flag is necessary.

      Original Description

      This ticket is to introduce a flag, which toggles the logic for bumping the collection's major version during split.

      The flag should only be meaningful for the Config Server and should default to OFF (meaning the major version should not be bumped by default).

      This flag should be present in 3.6, 4.0 and 4.2. For 4.4, since the auto-splitter is running on the shards, this flag should be removed, along with the aforementioned logic for bumping the major version.

      Tl;dr:
      As part of a fix for a high CPU utilisation by the auto-splitter when writes are happening to the extreme chunks, we chose a solution, which erred on the side of correctness, with the reasoning that on most systems auto-splits are happening rarely and are not happening at the same time across all shards.

      Under this solution, the collection's major version would be incremented only if a split happens at the shard, whose shardVersion == collectionVersion. Therefore, unless a split happens on this shard, all the other shards will just catch-up to its shard version and will then stop. This will repeat when the shard containing the collection version performs a split again, so the idea was that this will be amortised across the number of shards.

      However, this logic seems to be causing more harm than good in the case of almost uniform writes across all chunks. If it is the case that all shards are doing splits almost in unison, under this fix there will constantly be a bump in the collection version, which means constant stalls due to StaleShardVersion.

            Assignee:
            matthew.saltz@mongodb.com Matthew Saltz (Inactive)
            Reporter:
            kaloian.manassiev@mongodb.com Kaloian Manassiev
            Votes:
            0 Vote for this issue
            Watchers:
            16 Start watching this issue

              Created:
              Updated:
              Resolved: