[SERVER-47985] Implement recovery of a shard's `shardVersion` before it is allowed to perform version checking Created: 06/May/20  Updated: 29/Oct/23  Resolved: 03/Jun/20

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: 4.7.0

Type: Task Priority: Major - P3
Reporter: Tommaso Tocci Assignee: Tommaso Tocci
Resolution: Fixed Votes: 0
Labels: PM-1645-Milestone-1
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
depends on SERVER-47974 Introduce ScopedShardVersionCriticalS... Closed
depends on SERVER-47975 Optimize ScopedShardVersionCriticalSe... Closed
is depended on by SERVER-48589 Ensure migration recovery is complete... Closed
is depended on by SERVER-45983 Perform the shardVersion recovery and... Closed
is depended on by SERVER-47982 Change the shard version update proce... Closed
is depended on by SERVER-47986 Introduce a thread to complete the sh... Closed
Problem/Incident
causes SERVER-48883 Range deletion recovery can invalidat... Closed
Related
is related to SERVER-47986 Introduce a thread to complete the sh... Closed
Backwards Compatibility: Fully Compatible
Sprint: Sharding 2020-05-18, Sharding 2020-06-01, Sharding 2020-06-15
Participants:
Linked BF Score: 38

 Description   

The sharding migration commit protocol does not perform a proper 2-phase commit in order to have the minimum possible latency impact in the steady state. Because of this, the following theoretical sequence of events is possible:

  • The current primary of a shard is just about to commit migration against the config server
  • A new primary is elected, which refreshes from the config server before the commit from the previous primary has reached it
  • The new primary sees old shardVersion (and as a result also filtering metadata) and incorrectly accepts writes which are supposed to go to a different shard

The RangeDeleter project implemented a command to bump a chunk's shard version and also added a check for chunk version equality as part of the migration commit. We should package that logic into an asynchronous task and make sure that forceFilteringMetadataRefresh executes that recovery logic before accepting the shardVersion returned from the config server.



 Comments   
Comment by Githook User [ 03/Jun/20 ]

Author:

{'name': 'Tommaso Tocci', 'email': 'tommaso.tocci@mongodb.com', 'username': 'toto-dev'}

Message: SERVER-47985 Implement recovery of a shard's `shardVersion` before it is allowed to perform version checking
Branch: master
https://github.com/mongodb/mongo/commit/f7c2b0c472b9c0ed9e12301cc5951ecf9f886722

Generated at Thu Feb 08 05:15:47 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.