[SERVER-33846] Alternative for setting oplog read timestamp on secondaries Created: 13/Mar/18  Updated: 29/Oct/23  Resolved: 16/Mar/18

Status: Closed
Project: Core Server
Component/s: Replication, Storage
Affects Version/s: None
Fix Version/s: 3.7.4

Type: Task Priority: Major - P3
Reporter: Eric Milkie Assignee: Daniel Gottlieb (Inactive)
Resolution: Fixed Votes: 1
Labels: rollback-functional
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Related
related to SERVER-41291 Oldest timestamp not always advanced ... Closed
is related to SERVER-39383 Speculative majority change stream up... Closed
Backwards Compatibility: Fully Compatible
Sprint: Repl 2018-03-26
Participants:
Linked BF Score: 0

 Description   

Currently, the oplog read timestamp is set via the same asynchronous mechanism, regardless of replication state (PRIMARY or SECONDARY): a thread loop takes note of the latest oplog entry's optime with no holes after it, waits for journal, and then publishes that optime as the new oplog read value.
The algorithm is correct for primary nodes. However, as an optimization, it does not have to wait for journaling on secondary nodes, because it is never possible to read holes after an unclean shutdown of a secondary node (due to our durable storing of the last applied time). Today, we have a problem with the stable timestamp (and oldest timestamp) racing ahead of the oplog read timestamp on secondaries. By forgoing the wait for journaling on secondaries, we can set the oplog read timestamp in lock step with the stable timestamp and oldest timestamp, thus avoiding the race.

The work for this ticket will be to change the oplog read timestamp loop to only operate while a node is in primary mode; in secondary mode, new code inserted into the applier loop will set the oplog read timestamp when the last applied time is set.



 Comments   
Comment by Githook User [ 16/Mar/18 ]

Author:

{'email': 'daniel.gottlieb@mongodb.com', 'name': 'Daniel Gottlieb', 'username': 'dgottlieb'}

Message: SERVER-33846: Set oplog visibility synchronously on secondaries, at the end of every batch.

This patch introduces an optimization to allow secondaries to set their
visibility synchronously with oplog application as well as bypassing
additional journal flushing meant for primaries. Primaries replicating oplog
entries atomically generate a new optime and pass it to the storage engine's
oplog record store via the `oplogDiskLocRegister` method. This code path
will now pass in a parameter `orderedCommit = false` that alerts the storage
engine to maintain the necessary oplog visibility semantics for that write.
This is existing behavior, the only difference is the addition of the
`orderedCommit` parameter.

Secondaries will now also call `oplogDiskLocRegister` at the end of every
batch. This call will pass in the optime of the last oplog entry applied and
`orderedCommit = true`. A storage engine may take this as a guarantee that
there are no oplog holes prior to the input optime.
Branch: master
https://github.com/mongodb/mongo/commit/8ab8025f23fb50d15336870a6cea4e4ca6f9673a

Generated at Thu Feb 08 04:34:45 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.