-
Type: Bug
-
Resolution: Fixed
-
Priority: Major - P3
-
Affects Version/s: 3.6.23, 4.0.28, 4.2.24, 7.1.0-rc0, 6.0.6, 4.4.22, 5.0.18, 7.0.0-rc3
-
Component/s: None
-
None
-
Fully Compatible
-
ALL
-
v7.0, v6.0, v5.0
-
Sharding EMEA 2023-06-26, Sharding EMEA 2023-07-10, Sharding EMEA 2023-07-24, Sharding EMEA 2023-08-07, Sharding EMEA 2023-08-21, Sharding EMEA 2023-09-04
-
152
In SERVER-30797, a majority write was added to the refresh path on primaries after fetching new routing information from the config server. This write ensured that the node which fetched the routing information was actually the majority primary, preventing incorrect filtering information from being applied in split brain scenarios.
This write was removed in SERVER-35092 since it was believed to be unnecessary and was causing stalls when a refresh happened without a majority of nodes available.
However, the split brain scenario for which the majority write was added is still a problem, and since the removal of that write, it is possible to hit this again. The scenario is as follows
- Suppose we have a 2 shard cluster with 3 nodes per shard where (min, 0) is on shard 0 and (0, max) is on shard 1 with one document in each chunk
- Now a network partition separates the primary of shard 0 from the secondaries and one of those secondaries steps up (creating a split brain scenario)
- Chunk (0, max) is moved back to shard 0
- A mongoS that hasn't learned about the new primary on shard 0 routes a majority read to the old primary
- The old primary (who still believes itself to be primary) fetches the new routing information from the config
In this case, the old primary will respond to the majority read using the newest filtering information but without ever having seen the chunk migration.
This can also affect secondaries who refresh via the node that believes itself to be primary, causing their filtering information to be ahead of the data they have.
The solution here is to add back in the majority noop write to the SSCCL. It will ensure that if new filtering information is found, it can only be used and sent to secondaries by the actual primary of the replica set.
- causes
-
SERVER-80712 Avoid leaving the replica set shard partitioned at the end of `linearizable_read_concern.js`
- Closed
-
SERVER-84623 Shard-local re-execution of a command might bubble up a misleading StaleConfig exception to the router
- Closed
- depends on
-
SERVER-78505 Database cache does not use the 'allowLocks' option correctly
- Closed
-
SERVER-80183 Remove operationTime check from store_retryable_find_and_modify_images_in_side_collection.js
- Closed
- is caused by
-
SERVER-35092 ShardServerCatalogCacheLoader should have a timeout waiting for read concern
- Closed
- is depended on by
-
SERVER-79609 Fix `findAndModify_upsert.js` test to accept StaleConfig error
- Closed
- related to
-
SERVER-30797 Shard primaries must commit a majority write before using updated chunk routing tables
- Closed
-
SERVER-79483 Investigate if tests should check operationTime being identical for retryable write responses
- Closed