Loading...

XML

Word

Printable

JSON

Type: Task
Resolution: Fixed
Priority: Major - P3
Fix Version/s: 4.7.0
Affects Version/s: None
Component/s: Sharding
Labels:
- PM-1645-Milestone-1

Backwards Compatibility:
Fully Compatible
Sprint:
Sharding 2020-05-18, Sharding 2020-06-01, Sharding 2020-06-15
Linked BF Score:
38
Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

The sharding migration commit protocol does not perform a proper 2-phase commit in order to have the minimum possible latency impact in the steady state. Because of this, the following theoretical sequence of events is possible:

The current primary of a shard is just about to commit migration against the config server
A new primary is elected, which refreshes from the config server before the commit from the previous primary has reached it
The new primary sees old shardVersion (and as a result also filtering metadata) and incorrectly accepts writes which are supposed to go to a different shard

The RangeDeleter project implemented a command to bump a chunk's shard version and also added a check for chunk version equality as part of the migration commit. We should package that logic into an asynchronous task and make sure that forceFilteringMetadataRefresh executes that recovery logic before accepting the shardVersion returned from the config server.

causes

SERVER-48883 Range deletion recovery can invalidate migration recovery on step up

Closed

depends on

SERVER-47974 Introduce ScopedShardVersionCriticalSection class

Closed

SERVER-47975 Optimize ScopedShardVersionCriticalSection in order to avoid convoy on SSV after a shardVersion change

Closed

is depended on by

SERVER-48589 Ensure migration recovery is completed before to run a new chunk migration

Closed

SERVER-45983 Perform the shardVersion recovery and refresh on a separate thread from that of the user request

Closed

SERVER-47982 Change the shard version update procedure of the migration source manager

Closed

SERVER-47986 Introduce a thread to complete the shard version recovery after step-up

Closed

is related to

SERVER-47986 Introduce a thread to complete the shard version recovery after step-up

Closed

(2 is depended on by, 1 is related to)

Assignee:: Tommaso Tocci
Reporter:: Tommaso Tocci
Participants:: Githook User, Tommaso Tocci
Votes:: 0 Vote for this issue
Watchers:: 1 Start watching this issue

Created:: May 06 2020 03:01:40 PM UTC
Updated:: Oct 29 2023 10:08:36 PM UTC
Resolved:: Jun 03 2020 05:32:23 PM UTC
Confidence Status Last Update:: 12/May/20 1:00 PM

Details

Description

Attachments

Issue Links

Activity

People

Dates