[SERVER-59929] unexpected slower update/insert operation bease of splitchunk and moveChunk Created: 14/Sep/21  Updated: 29/Oct/23  Resolved: 14/Dec/21

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 4.0.26
Fix Version/s: 4.0.28, 4.2.19

Type: Bug Priority: Major - P3
Reporter: FirstName lipengchong Assignee: Kaloian Manassiev
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Problem/Incident
is caused by SERVER-56779 Do not use the collection distributed... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v4.2
Sprint: Sharding EMEA 2021-10-04, Sharding EMEA 2021-10-18, Sharding EMEA 2021-11-01, Sharding EMEA 2021-11-15, Sharding EMEA 2021-11-29, Sharding EMEA 2021-12-13, Sharding EMEA 2021-12-27
Participants:

 Description   

Recently, we have some sharding cluster with version 4.0.26. sometime we will get a result that update operation is extremely slow, about tens of seconds to a few minutes.

 
After in-depth analysis, I think it's a BUG here.
 
First , when moveChunk happens, A chunk will move from shard A to shard B , B will cleanup this chunk data first and will wait for cleanup to make sure that the new chunk data wouldn't delete by another older cleanup task. That is , moveChunk will cost a very long time, up to 15 minutes (rangeDeleterBatchDelayMS ) .
 

// Wait for any other, overlapping queued deletions to drain        
auto status = CollectionShardingRuntime::waitForClean(opCtx, _nss, _epoch, footprint);

Secondly, there is a jara https://jira.mongodb.org/browse/SERVER-56779 , and from 4.0.26 , MongoDB do not use collection distributed lock for chunk merges,and use the ActiveMigrationsRegistry. But it cause a new sense 

 *   - Move || Move (same chunk): The second move will join the first
 *   - Move || Move (different chunks or collections): The second move will result in a
 *                                             ConflictingOperationInProgress error
 *   - Move || Split/Merge (same collection): The second operation will block behind the first
 *   - Move/Split/Merge || Split/Merge (for different collections): Can proceed concurrently

That is split will be blocked by movechunk until the moveChunk ended.
 
last, in 4.0.26 ,the auto-split is alse trigger by mongos, and is a part of the update operation.
 
So sometimes there is such a scene, a chunk moved from shard A to shard B , and then it is moved from shard B to shard A, the second moveChunk task will be blocked, up to 15 minutes。then the update operation will be blocked by splitChunk, and splitchunk is waiting for last moveChunk
 
from 4.2 ,auto-split is triggered by mongod , and it's an asynchronous task. So this problem is only affect 4.0.26.



 Comments   
Comment by Githook User [ 14/Dec/21 ]

Author:

{'name': 'Kaloian Manassiev', 'email': 'kaloian.manassiev@mongodb.com', 'username': 'kaloianm'}

Message: SERVER-59929 Limit the blocking of split/merge behind other metadata operations to 5 seconds

(cherry picked from commit ed7cca61938ee12f5a9cbe870af096987c662f5c)
Branch: v4.2
https://github.com/mongodb/mongo/commit/334f18f70fa51f95863434cc23d095a91492c8c4

Comment by Githook User [ 14/Dec/21 ]

Author:

{'name': 'Kaloian Manassiev', 'email': 'kaloian.manassiev@mongodb.com', 'username': 'kaloianm'}

Message: SERVER-59929 Limit the blocking of split/merge behind other metadata operations to 5 seconds
Branch: v4.0
https://github.com/mongodb/mongo/commit/ed7cca61938ee12f5a9cbe870af096987c662f5c

Comment by FirstName lipengchong [ 22/Sep/21 ]

Thank you Kaloian Manassiev.

Thanks for your summary, but maybe it's a little different. Before 4.0.26, splitChunk doesn't uses the kDefaultLockTimeout ,but uses the const Milliseconds DistLockManager::kSingleLockAttemptTimeout(0);

StatusWith<boost::optional<ChunkRange>> splitChunk(OperationContext* opCtx,
                                                   const NamespaceString& nss,
                                                   const BSONObj& keyPatternObj,
                                                   const ChunkRange& chunkRange,
                                                   const std::vector<BSONObj>& splitKeys,
                                                   const std::string& shardName,
                                                   const OID& expectedCollectionEpoch) {
    //
    // Lock the collection's metadata and get highest version for the current shard
    // TODO(SERVER-25086): Remove distLock acquisition from split chunk
    //
    const std::string whyMessage(
        str::stream() << "splitting chunk " << redact(chunkRange.toString()) << " in "
                      << nss.toString());
    auto scopedDistLock = Grid::get(opCtx)->catalogClient()->getDistLockManager()->lock(
        opCtx, nss.ns(), whyMessage, DistLockManager::kSingleLockAttemptTimeout); // here

so even Before 4.0.26, Writes will failed immediately because of lock failed.

 

Comment by Kaloian Manassiev [ 17/Sep/21 ]

Thank you lpc for the detailed bug report and analysis - pretty much all of it is on point. The only correction is that that both before and after SERVER-56779, we were waiting for the distributed lock to be acquired, but before we switched to using the ActiveMigrationsRegistry, there was an upper bound of 20 seconds of blocking for the dist lock to be acquired during a split issued by MongoS, before it fails.

So in summary:

  • Before 4.0.26: Writes can block for up to 20 seconds due to auto-split (caused either by the 15 minutes chunk back/forth scenario that you describe, or even by the presence of a concurrent chunk migration anywhere else on the cluster)
  • After 4.0.26: Writes can block for the duration of a migration due to auto-split (most of the times it would be fast, but in the presence of back/forth it could be up to 15 minutes)

The proposal to fix this (only for 4.0) is to restore the 20 seconds cap on SplitChunk waiting for MoveChunk to complete in order to mimic the pre 4.0.26 behaviour.

Comment by FirstName lipengchong [ 15/Sep/21 ]

Maybe it's better  SERVER-56779  is rollbacked on version 4.0

Generated at Thu Feb 08 05:48:32 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.