-
Type: Bug
-
Resolution: Works as Designed
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: Replication
-
None
-
Replication
-
ALL
-
v3.6
The last phase of a secondary performing initial sync is to apply oplog operations up through some time `T` representing when the collection cloning phase completed. It's incorrect for a secondary to respond to majority read/at a timestamp queries before time T.
When a secondary comes out of initial sync, it will still have a notion of the replica sets majority commit time. Because the majority commit time is translated to a "read at a timestamp", the secondary will incorrectly respond to a query, but with a view of inconsistent data.
A couple starting points for solutions:
- An API was introduced for recover to a stable timestamp known as the "initial data timestamp" that replication sets when initial sync completes. This represents the timestamp at which the data is in a consistent state. This could be used to reject/block incoming majority reads/read at a timestamp requests.
- Alternatively, a secondary can refuse to come out of initial sync until the majority commit point passes `T`. Currently there is no mechanism to tell drivers which timestamps a node can service reads for. This solution would be a way to signal to drivers to not send majority reads the node cannot service, at the cost of not participating in reads `>= T`.
- is depended on by
-
SERVER-30809 Investigating remaining writes to the [KV]Catalog that must be timestamped.
- Closed
- is related to
-
SERVER-32226 oldest_timestamp should track the last applied time, during initial sync
- Closed
-
SERVER-30577 Clear list of stable timestamp candidates on Rollback and Initial Sync
- Closed
- related to
-
SERVER-32237 Nodes that cannot become primary must neither update progress nor vote "aye"
- Closed