Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-59929

unexpected slower update/insert operation bease of splitchunk and moveChunk

    XMLWordPrintable

Details

    • Bug
    • Status: Closed
    • Major - P3
    • Resolution: Fixed
    • 4.0.26
    • 4.0.28, 4.2.19
    • Sharding
    • None
    • Fully Compatible
    • ALL
    • v4.2
    • Sharding EMEA 2021-10-04, Sharding EMEA 2021-10-18, Sharding EMEA 2021-11-01, Sharding EMEA 2021-11-15, Sharding EMEA 2021-11-29, Sharding EMEA 2021-12-13, Sharding EMEA 2021-12-27

    Description

      Recently, we have some sharding cluster with version 4.0.26. sometime we will get a result that update operation is extremely slow, about tens of seconds to a few minutes.

       
      After in-depth analysis, I think it's a BUG here.
       
      First , when moveChunk happens, A chunk will move from shard A to shard B , B will cleanup this chunk data first and will wait for cleanup to make sure that the new chunk data wouldn't delete by another older cleanup task. That is , moveChunk will cost a very long time, up to 15 minutes (rangeDeleterBatchDelayMS ) .
       

      // Wait for any other, overlapping queued deletions to drain        
      auto status = CollectionShardingRuntime::waitForClean(opCtx, _nss, _epoch, footprint);
      

      Secondly, there is a jara https://jira.mongodb.org/browse/SERVER-56779 , and from 4.0.26 , MongoDB do not use collection distributed lock for chunk merges,and use the ActiveMigrationsRegistry. But it cause a new sense 

       *   - Move || Move (same chunk): The second move will join the first
       *   - Move || Move (different chunks or collections): The second move will result in a
       *                                             ConflictingOperationInProgress error
       *   - Move || Split/Merge (same collection): The second operation will block behind the first
       *   - Move/Split/Merge || Split/Merge (for different collections): Can proceed concurrently
      

      That is split will be blocked by movechunk until the moveChunk ended.
       
      last, in 4.0.26 ,the auto-split is alse trigger by mongos, and is a part of the update operation.
       
      So sometimes there is such a scene, a chunk moved from shard A to shard B , and then it is moved from shard B to shard A, the second moveChunk task will be blocked, up to 15 minutes。then the update operation will be blocked by splitChunk, and splitchunk is waiting for last moveChunk
       
      from 4.2 ,auto-split is triggered by mongod , and it's an asynchronous task. So this problem is only affect 4.0.26.

      Attachments

        Issue Links

          Activity

            People

              kaloian.manassiev@mongodb.com Kaloian Manassiev
              lpc lipengchong
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: