[SERVER-59929] unexpected slower update/insert operation bease of splitchunk and moveChunk Created: 14/Sep/21 Updated: 29/Oct/23 Resolved: 14/Dec/21 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | 4.0.26 |
| Fix Version/s: | 4.0.28, 4.2.19 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | FirstName lipengchong | Assignee: | Kaloian Manassiev |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||
| Operating System: | ALL | ||||||||||||
| Backport Requested: |
v4.2
|
||||||||||||
| Sprint: | Sharding EMEA 2021-10-04, Sharding EMEA 2021-10-18, Sharding EMEA 2021-11-01, Sharding EMEA 2021-11-15, Sharding EMEA 2021-11-29, Sharding EMEA 2021-12-13, Sharding EMEA 2021-12-27 | ||||||||||||
| Participants: | |||||||||||||
| Description |
|
Recently, we have some sharding cluster with version 4.0.26. sometime we will get a result that update operation is extremely slow, about tens of seconds to a few minutes.
Secondly, there is a jara https://jira.mongodb.org/browse/SERVER-56779 , and from 4.0.26 , MongoDB do not use collection distributed lock for chunk merges,and use the ActiveMigrationsRegistry. But it cause a new sense
That is split will be blocked by movechunk until the moveChunk ended. |
| Comments |
| Comment by Githook User [ 14/Dec/21 ] | ||||||||||||||||
|
Author: {'name': 'Kaloian Manassiev', 'email': 'kaloian.manassiev@mongodb.com', 'username': 'kaloianm'}Message: (cherry picked from commit ed7cca61938ee12f5a9cbe870af096987c662f5c) | ||||||||||||||||
| Comment by Githook User [ 14/Dec/21 ] | ||||||||||||||||
|
Author: {'name': 'Kaloian Manassiev', 'email': 'kaloian.manassiev@mongodb.com', 'username': 'kaloianm'}Message: | ||||||||||||||||
| Comment by FirstName lipengchong [ 22/Sep/21 ] | ||||||||||||||||
|
Thank you Kaloian Manassiev Thanks for your summary, but maybe it's a little different. Before 4.0.26, splitChunk doesn't uses the kDefaultLockTimeout ,but uses the const Milliseconds DistLockManager::kSingleLockAttemptTimeout(0);
so even Before 4.0.26, Writes will failed immediately because of lock failed.
| ||||||||||||||||
| Comment by Kaloian Manassiev [ 17/Sep/21 ] | ||||||||||||||||
|
Thank you lpc for the detailed bug report and analysis - pretty much all of it is on point. The only correction is that that both before and after So in summary:
The proposal to fix this (only for 4.0) is to restore the 20 seconds cap on SplitChunk waiting for MoveChunk to complete in order to mimic the pre 4.0.26 behaviour. | ||||||||||||||||
| Comment by FirstName lipengchong [ 15/Sep/21 ] | ||||||||||||||||
|
Maybe it's better |