Details
Description
Recently, we have some sharding cluster with version 4.0.26. sometime we will get a result that update operation is extremely slow, about tens of seconds to a few minutes.
Â
After in-depth analysis, I think it's a BUG here.
Â
First , when moveChunk happens, A chunk will move from shard A to shard B , B will cleanup this chunk data first and will wait for cleanup to make sure that the new chunk data wouldn't delete by another older cleanup task. That is , moveChunk will cost a very long time, up to 15 minutes (rangeDeleterBatchDelayMSÂ ) .
Â
// Wait for any other, overlapping queued deletions to drain
|
auto status = CollectionShardingRuntime::waitForClean(opCtx, _nss, _epoch, footprint);
|
Secondly, there is a jara https://jira.mongodb.org/browse/SERVER-56779 , and from 4.0.26 , MongoDB do not use collection distributed lock for chunk merges,and use the ActiveMigrationsRegistry. But it cause a new senseÂ
* - Move || Move (same chunk): The second move will join the first
|
* - Move || Move (different chunks or collections): The second move will result in a
|
* ConflictingOperationInProgress error
|
* - Move || Split/Merge (same collection): The second operation will block behind the first
|
* - Move/Split/Merge || Split/Merge (for different collections): Can proceed concurrently |
That is split will be blocked by movechunk until the moveChunk ended.
Â
last, in 4.0.26 ,the auto-split is alse trigger by mongos, and is a part of the update operation.
Â
So sometimes there is such a scene, a chunk moved from shard A to shard B , and then it is moved from shard B to shard A, the second moveChunk task will be blocked, up to 15 minutes。then the update operation will be blocked by splitChunk, and splitchunk is waiting for last moveChunk
Â
from 4.2 ,auto-split is triggered by mongod , and it's an asynchronous task. So this problem is only affect 4.0.26.
Attachments
Issue Links
- is caused by
-
SERVER-56779 Do not use the collection distributed lock for chunk merges
-
- Closed
-