[SERVER-30797] Shard primaries must commit a majority write before using updated chunk routing tables Created: 23/Aug/17  Updated: 30/Oct/23  Resolved: 03/Oct/17

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: 3.6.0-rc0

Type: Bug Priority: Major - P3
Reporter: Andy Schwerin Assignee: Dianna Hohensee (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
is related to SERVER-78115 Shard primaries must commit a majorit... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Sprint: Sharding 2017-10-02, Sharding 2017-10-23
Participants:

 Description   

After receiving a chunk routing table change and before putting it to use, a shard primary needs to confirm that it was the primary at the time it received the change. Otherwise, it may use a version of the routing table inconsistent with the data that it stores, potentially leading it to return orphans or fail to return results at read concerns "local" and stronger.

– Details –
This can happen if the shard is split-brain (has two primaries on either side of a network partition), and the shard has participated in a migration through the newer primary. The old primary can receive a versioned request, refresh its routing table from the config server, and service the request in the short window before it realizes there's a new primary. If a chunk has been donated, the old primary will return orphans; if received, it will miss data.

Further, if the old primary persists the routing table updates, any secondaries on the same side of the network partition can also exhibit this incorrect behavior.



 Comments   
Comment by Githook User [ 03/Oct/17 ]

Author:

{'email': 'dianna.hohensee@10gen.com', 'name': 'Dianna Hohensee', 'username': 'DiannaHohensee'}

Message: SERVER-30797 Ensure shard primary is a majority primary before using updated chunk routing tables
Branch: master
https://github.com/mongodb/mongo/commit/d5a04de30c06898b4a588f286e2ab38fcb91600f

Comment by Andy Schwerin [ 25/Sep/17 ]

Only the new primary will be able to complete majority writes, and only nodes in the same partition as the new primary will be able to provide causally consistent majority reads following those writes.

However, this fix is about ensuring that local and majority read concern only return documents as they at some point existed (majority) or were proposed to exist (local). Without it, those read concerns may erroneously return orphans, which can contain changes never proposed by a client application, albeit only in scenarios like the one described.

Comment by Dianna Hohensee (Inactive) [ 25/Sep/17 ]

To be clear, it seems like this scenario will still be pretty broken because writes (local or majority) could happen to one side of a split, then reads (local or majority) to the other side won't see the changes. Is that alright? schwerin

Generated at Thu Feb 08 04:25:02 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.