Loading...

XML

Word

Printable

JSON

Type: Task
Resolution: Unresolved
Priority: Major - P3
Fix Version/s: None
Affects Version/s: None
Component/s: Replication
Labels:
- PM-173

Assigned Teams:

Replication
Confidence Status:
None
Work Order:
3

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

Look for writes sent to primary since the read operation was sent to the replication set. If such a write exists, check to see if the write has been propagated to a majority of the nodes in the system. If so, current primary is the true primary. This should reduce the number of noops we use, as we are checking for a user submitted write (that would have had to be done anyways).

At the beginning of a linearizable read, the server shall take notice of the current last set timestamp (LST). This is the optime of the last write across the entire mongod instance that has already been returned to a client. Technically, this is the optime assigned to the “last” document written to the oplog, but due to multiple writers and concurrency logic to produce the illusion of a monotonically increasing optime, it may not necessarily be visible in the oplog just yet. This value, which we will call original LST, will be used later on to determine if any writes have completed and committed during the period of time while the read is being processed.

(Optimization 1) When the server finishes the read, it observes the commit level (optime of the last committed operation). If the commit level is higher (greater than) the original LST, it means that a write that completed after the client issued the linearizable read has now been committed by a majority of nodes – which proves that the server was still a valid primary at the time the read began. This confirms that the read can be linearizable, so the server returns the data.

(Optimization 2) If the commit level is not higher than the original LST, it then observes the current last set timestamp. If the current LST is greater than the original LST, this means at least one write is currently replicating and may soon move the commit level. The server blocks until either the condition in step 1 above is reached, or maxTimeMS is reached (timeout).

If the current LST is the same as the original LST, no writes have occurred to prove primaryship. The server shall write an ‘n’ op to the oplog, and then block until either the no-op gets replicated to the majority of nodes, or a timeout occurs. This part has been implemented in ~~SERVER-24497~~.

depends on

SERVER-24497 implement noop writes to test for primary-ship

Closed

Assignee:: [DO NOT USE] Backlog - Replication Team
Reporter:: Hari Devaraj (Inactive)
Participants:: [DO NOT USE] Backlog - Replication Team, Hari Devaraj
Votes:: 0 Vote for this issue
Watchers:: 6 Start watching this issue

Created:: Jun 09 2016 04:12:18 PM UTC
Updated:: Dec 06 2022 04:24:13 AM UTC

Details

Description

Attachments

Issue Links

Activity

People

Dates