[SERVER-60081] ChunkVersion::IGNORED should ignore the critical section Created: 20/Sep/21  Updated: 31/Oct/21  Resolved: 31/Oct/21

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 4.2.16, 5.0.3, 4.4.9
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Kaloian Manassiev Assignee: Kaloian Manassiev
Resolution: Won't Fix Votes: 2
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Operating System: ALL
Sprint: Sharding EMEA 2021-10-18
Participants:

 Description   

The special value ChunkVersion::IGNORED isĀ used to indicate that an operation is coming from a router (as opposed to direct connection to a shard), but that the shard must not perform version checking, under the assumption that the caller knows what they are doing.

One special case is the UNKNOWN version, in which case we do want IGNORED to trigger StaleConfig in order for the writes to contain the shard key (see SERVER-44598).

This ticket is to skip the critical section check if the caller is using the IGNORED version.

NOTE: This fix only applies to versions 5.0 and earlier, because we do not provide any guarantees with respect to writes to orphan documents. Starting with version 5.1 we will provide such guarantees in which case the critical section must be obeyed so that a multi-write can see which documents are orphaned.



 Comments   
Comment by Kaloian Manassiev [ 31/Oct/21 ]

After some investigation of an unrelated BF (failure in our continuous integration system), I realised that ignoring the critical section for multi-writes impacts the convergence of chunk migration. For context, when chunk migration enters the critical section, this stops all incoming writes and guarantees that the final changes to the chunk will be caught up in some small amount of time.

With multi-writes no longer blocking on the critical section, this means that the writes buffer may continue growing unbounded and migration will never converge. The side effect of this are failed migrations and stalls of up to 30 seconds under the critical section, for writes.

So effectively we have a trade-off to make:

  1. Multi-writes fail in the presence of back-to-back migrations (such as adding a new shard)
  2. All writes against the shard stall for up to 30 seconds and possibly fail, in the presence of multi-updates and concurrent migrations

Since the potential effects of (2) are much worse (stall on the entire shard), we decided to revert this change and pursue a proper solution for the problem in (1).

Comment by Githook User [ 31/Oct/21 ]

Author:

{'name': 'Kaloian Manassiev', 'email': 'kaloian.manassiev@mongodb.com', 'username': 'kaloianm'}

Message: Revert "SERVER-60081 Make ChunkVersion::IGNORED not consider the migration critical section"

This reverts commit a83b0c692c886a595b27358fc5eb585547e0297a.
Branch: v5.0
https://github.com/mongodb/mongo/commit/5a75cc4c9fc29b462f4d4a5d56096b804a009b07

Comment by Githook User [ 29/Oct/21 ]

Author:

{'name': 'Kaloian Manassiev', 'email': 'kaloian.manassiev@mongodb.com', 'username': 'kaloianm'}

Message: Revert "SERVER-60081 Make ChunkVersion::IGNORED not consider the migration critical section"

This reverts commit 18a3bcd42863ada3ad0f57dcac04021ccfa806d4.
Branch: v5.1
https://github.com/mongodb/mongo/commit/cfff9eaf82fe360ab11cd3c527835f0576d429d7

Comment by Githook User [ 12/Oct/21 ]

Author:

{'name': 'Kaloian Manassiev', 'email': 'kaloian.manassiev@mongodb.com', 'username': 'kaloianm'}

Message: SERVER-60081 Make ChunkVersion::IGNORED not consider the migration critical section

(cherry picked from commit a83b0c692c886a595b27358fc5eb585547e0297a)
Branch: v5.1
https://github.com/mongodb/mongo/commit/18a3bcd42863ada3ad0f57dcac04021ccfa806d4

Comment by Githook User [ 12/Oct/21 ]

Author:

{'name': 'Kaloian Manassiev', 'email': 'kaloian.manassiev@mongodb.com', 'username': 'kaloianm'}

Message: SERVER-60081 Make ChunkVersion::IGNORED not consider the migration critical section
Branch: v5.0
https://github.com/mongodb/mongo/commit/a83b0c692c886a595b27358fc5eb585547e0297a

Generated at Thu Feb 08 05:48:54 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.