After receiving a chunk routing table change and before putting it to use, a shard primary needs to confirm that it was the primary at the time it received the change. Otherwise, it may use a version of the routing table inconsistent with the data that it stores, potentially leading it to return orphans or fail to return results at read concerns "local" and stronger.
– Details –
This can happen if the shard is split-brain (has two primaries on either side of a network partition), and the shard has participated in a migration through the newer primary. The old primary can receive a versioned request, refresh its routing table from the config server, and service the request in the short window before it realizes there's a new primary. If a chunk has been donated, the old primary will return orphans; if received, it will miss data.
Further, if the old primary persists the routing table updates, any secondaries on the same side of the network partition can also exhibit this incorrect behavior.
- is related to
-
SERVER-78115 Shard primaries must commit a majority write before using new routing information from the config server
- Closed