Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-59929

unexpected slower update/insert operation bease of splitchunk and moveChunk

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Investigating
    • Priority: Major - P3
    • Resolution: Unresolved
    • Affects Version/s: 4.0.26
    • Fix Version/s: None
    • Component/s: Sharding
    • Labels:
      None
    • Operating System:
      ALL
    • Sprint:
      Sharding EMEA 2021-10-04, Sharding EMEA 2021-10-18, Sharding EMEA 2021-11-01

      Description

      Recently, we have some sharding cluster with version 4.0.26. sometime we will get a result that update operation is extremely slow, about tens of seconds to a few minutes.

       
      After in-depth analysis, I think it's a BUG here.
       
      First , when moveChunk happens, A chunk will move from shard A to shard B , B will cleanup this chunk data first and will wait for cleanup to make sure that the new chunk data wouldn't delete by another older cleanup task. That is , moveChunk will cost a very long time, up to 15 minutes (rangeDeleterBatchDelayMS ) .
       

      // Wait for any other, overlapping queued deletions to drain        
      auto status = CollectionShardingRuntime::waitForClean(opCtx, _nss, _epoch, footprint);
      

      Secondly, there is a jara https://jira.mongodb.org/browse/SERVER-56779 , and from 4.0.26 , MongoDB do not use collection distributed lock for chunk merges,and use the ActiveMigrationsRegistry. But it cause a new sense 

       *   - Move || Move (same chunk): The second move will join the first
       *   - Move || Move (different chunks or collections): The second move will result in a
       *                                             ConflictingOperationInProgress error
       *   - Move || Split/Merge (same collection): The second operation will block behind the first
       *   - Move/Split/Merge || Split/Merge (for different collections): Can proceed concurrently
      

      That is split will be blocked by movechunk until the moveChunk ended.
       
      last, in 4.0.26 ,the auto-split is alse trigger by mongos, and is a part of the update operation.
       
      So sometimes there is such a scene, a chunk moved from shard A to shard B , and then it is moved from shard B to shard A, the second moveChunk task will be blocked, up to 15 minutes。then the update operation will be blocked by splitChunk, and splitchunk is waiting for last moveChunk
       
      from 4.2 ,auto-split is triggered by mongod , and it's an asynchronous task. So this problem is only affect 4.0.26.

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              kaloian.manassiev Kaloian Manassiev
              Reporter:
              lpc lipengchong
              Participants:
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Dates

                Created:
                Updated: