[SERVER-60081] ChunkVersion::IGNORED should ignore the critical section Created: 20/Sep/21 Updated: 31/Oct/21 Resolved: 31/Oct/21 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | 4.2.16, 5.0.3, 4.4.9 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Kaloian Manassiev | Assignee: | Kaloian Manassiev |
| Resolution: | Won't Fix | Votes: | 2 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||
| Operating System: | ALL | ||||
| Sprint: | Sharding EMEA 2021-10-18 | ||||
| Participants: | |||||
| Description |
|
The special value ChunkVersion::IGNORED isĀ used to indicate that an operation is coming from a router (as opposed to direct connection to a shard), but that the shard must not perform version checking, under the assumption that the caller knows what they are doing. One special case is the UNKNOWN version, in which case we do want IGNORED to trigger StaleConfig in order for the writes to contain the shard key (see This ticket is to skip the critical section check if the caller is using the IGNORED version. NOTE: This fix only applies to versions 5.0 and earlier, because we do not provide any guarantees with respect to writes to orphan documents. Starting with version 5.1 we will provide such guarantees in which case the critical section must be obeyed so that a multi-write can see which documents are orphaned. |
| Comments |
| Comment by Kaloian Manassiev [ 31/Oct/21 ] |
|
After some investigation of an unrelated BF (failure in our continuous integration system), I realised that ignoring the critical section for multi-writes impacts the convergence of chunk migration. For context, when chunk migration enters the critical section, this stops all incoming writes and guarantees that the final changes to the chunk will be caught up in some small amount of time. With multi-writes no longer blocking on the critical section, this means that the writes buffer may continue growing unbounded and migration will never converge. The side effect of this are failed migrations and stalls of up to 30 seconds under the critical section, for writes. So effectively we have a trade-off to make:
Since the potential effects of (2) are much worse (stall on the entire shard), we decided to revert this change and pursue a proper solution for the problem in (1). |
| Comment by Githook User [ 31/Oct/21 ] |
|
Author: {'name': 'Kaloian Manassiev', 'email': 'kaloian.manassiev@mongodb.com', 'username': 'kaloianm'}Message: Revert " This reverts commit a83b0c692c886a595b27358fc5eb585547e0297a. |
| Comment by Githook User [ 29/Oct/21 ] |
|
Author: {'name': 'Kaloian Manassiev', 'email': 'kaloian.manassiev@mongodb.com', 'username': 'kaloianm'}Message: Revert " This reverts commit 18a3bcd42863ada3ad0f57dcac04021ccfa806d4. |
| Comment by Githook User [ 12/Oct/21 ] |
|
Author: {'name': 'Kaloian Manassiev', 'email': 'kaloian.manassiev@mongodb.com', 'username': 'kaloianm'}Message: (cherry picked from commit a83b0c692c886a595b27358fc5eb585547e0297a) |
| Comment by Githook User [ 12/Oct/21 ] |
|
Author: {'name': 'Kaloian Manassiev', 'email': 'kaloian.manassiev@mongodb.com', 'username': 'kaloianm'}Message: |