From 200fcc5c903653f67c9838b630e3313839c42dc4 Mon Sep 17 00:00:00 2001 From: Preeti Murthy Date: Mon, 14 Oct 2024 15:18:27 +0530 Subject: [PATCH] Drop last batch of oplog entries if primary has changed Mongo's speculative oplog pull entries as well as application of the pulled entries is costing us a downtime in the case of failovers where oplog entries have replicated to majority secondaries but not to the new primary. Cluster becomes unavailable for minutes since rollback+checkpointing the rolled back entries takes that long. We are therefore looking to avoid speculation during failover. This patch(which should ideally be based on an opt in config) drops the last batch of oplog entries pulled if the previous heartbeat response indicates a new primary elected. This makes it so oplog entries from the old primary does not replicate to majority secondaries. We understand the consequence is we lose the entries that could potentially replicate to the new primary also - but we can afford to wait for secondaries to catch up to the new primary in this scenario (which is order of seconds) rather than lose the cluster for minutes(which is currently the case with rollbacks). As far as we can tell, this patch should not drop the last pulled entries if the pulling node is the primary itself, so this still gives an opportunity for the new primary to continue to catch up to the old primary. Secondly, we also understand that the patch is a no-op if the secondary pulling the batch is not aware of the new primary. But this will be the case with the non-majority set of secondaries and its ok if they pull in poisoned/to be rolled back entries since we will not have the majority cluster in a potential state of rollback atleast. diff --git a/src/mongo/db/repl/replication_coordinator_impl.cpp b/src/mongo/db/repl/replication_coordinator_impl.cpp index d2c23ad2e94..d9ddc2172ae 100644 --- a/src/mongo/db/repl/replication_coordinator_impl.cpp +++ b/src/mongo/db/repl/replication_coordinator_impl.cpp @@ -5361,7 +5361,7 @@ ChangeSyncSourceAction ReplicationCoordinatorImpl::shouldChangeSyncSource( if (_topCoord->shouldChangeSyncSource( currentSource, replMetadata, oqMetadata, lastOpTimeFetched, now)) { - return ChangeSyncSourceAction::kStopSyncingAndEnqueueLastBatch; + return ChangeSyncSourceAction::kStopSyncingAndDropLastBatchIfPresent; } const auto readPreference = _getSyncSourceReadPreference(lock); -- 2.47.0