[SERVER-46516] Majority write concern is blocked by dropping snapshot on reconfig Created: 01/Mar/20  Updated: 06/Dec/22  Resolved: 08/Feb/21

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Siyuan Zhou Assignee: Backlog - Replication Team
Resolution: Won't Fix Votes: 0
Labels: safe-reconfig-related
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File upgrade_downgrade_idempotency.js    
Issue Links:
Duplicate
duplicates SERVER-47142 Check primary before writing replset ... Closed
Related
related to SERVER-47142 Check primary before writing replset ... Closed
related to SERVER-46288 Reconfig in 4.2 style with the curren... Closed
related to SERVER-54389 Audit internal uses of force reconfig... Closed
is related to SERVER-47206 Majority commit point is not set back... Backlog
is related to SERVER-40649 Drop snapshots for reconfig via heart... Backlog
is related to SERVER-47205 Stopping dropping snapshots after saf... Closed
Assigned Teams:
Replication
Operating System: ALL
Steps To Reproduce:

Revert the change of write concern for setFCV and the check of config content change before dropping snapshot in SERVER-46288, then jstests/core/txns/await_prepared_transactions_on_FCV_downgrade.js will time out. max_time_ms_sharded_new_commands.js in sharding_csrs_continuous_config_stepdown will fail as well.

Sprint: Repl 2020-03-23, Repl 2020-04-06
Participants:
Linked BF Score: 60

 Description   

On reconfig, the primary drops all snapshots used to serve majority reads since the quorum may change by reconfig. The _currentCommittedSnapshot should be recalculated very soon, after a round of heartbeats in the worst case. However it's not the case. Majority writes are blocked until they are interrupted.

We need to confirm that the _currentCommittedSnapshot is updated correctly after dropping snapshots and the waiters are signaled correctly.



 Comments   
Comment by Steven Vannelli [ 08/Feb/21 ]

Since we are doing SERVER-54389, we decided we do not need this one.

Comment by Pavithra Vetriselvan [ 05/Feb/21 ]

Re-opening this ticket because SERVER-47142 does not ensure internal callers of doReplSetReconfig do the no-op write if they use "force". Please see the attached repro for idempotency issues with setFCV when adding a force reconfig to the upgrade/downgrade procedure.

We should revisit our contract when dropping snapshots and the assumptions we can make based off of that.

Comment by Tess Avitabile (Inactive) [ 30/Mar/20 ]

Closing in favor of SERVER-47142. That ticket is going to do the noop write in the same storage transaction as writing the local config document and check whether the node is still primary before doing the write. This will happen is the doReplSetReconfig function, which ensures internal callers of this function do the noop write.

Comment by Tess Avitabile (Inactive) [ 26/Mar/20 ]

Sure, I can move the noop write to doReplSetReconfig in order to address the TODO in SERVER-46345. I would like to do this on the master branch only, so as not to destabilize 4.4, since there is no bug. I won't bother to remove the workaround for FCV downgrade from master, since that will be removed by the Upgrade/Downgrade project. I'll also leave in the code to not drop snapshots if only the config version/term changed.

_dropAllSnapshots_inlock is also called after rollback and when we drop replicated databases in initial sync. The call after rollback is no longer needed, but it hasn't caused a problem that I'm aware of. I think there's a larger problem that our language about snapshots is confusing. I proposed that in the pain point spreadsheet for cleanup.

That's correct that we don't update _currentCommittedSnapshot on journal flush when running with journaling disabled, but we will still update it due to replSetUpdatePosition due to the noop write.

Comment by Siyuan Zhou [ 25/Mar/20 ]

Thanks tess.avitabile for the detailed investigation! I'd propose to move the no-op write to doReplSetReconfig to make the behavior consistent.

I'm wondering if this is a more general problem since _dropAllSnapshots_inlock is also called in other places. Another question is when _rsConfig.getWriteConcernMajorityShouldJournal() is false, we don't update _currentCommittedSnapshot on journal flush.

Comment by Judah Schvimer [ 25/Mar/20 ]

There will be a third caller soon for removing 'newlyAdded' fields for ISS in SERVER-46345. I actually added a TODO in that patch referencing this ticket. While I agree this isn't a bug, it does lead to some testing difficulties.

Comment by Tess Avitabile (Inactive) [ 25/Mar/20 ]

I found the root cause for this issue, and it only affects internal callers of ReplicationCoordinator::doReplSetReconfig().

_currentCommittedSnapshot gets updated in two ways:

It is not updated by heartbeats. When the replSetReconfig command is run, after we drop snapshots, the command writes a noop oplog entry. This triggers a round of replSetUpdatePosition commands, which restores the _currentCommittedSnapshot. However, when doReplSetReconfig() is called internally, no oplog entry is written, so _currentCommittedSnapshot is unset until the next write.

I don't believe this leads to any bugs. doReplSetReconfig() only has two internal callers:

Since there is no bug, I'm inclined to close this issue as Gone Away. Now that we understand the issue, we could file a ticket for code clean-up to change how we do the workarounds. siyuan.zhou, how does that sound to you?

Comment by Tess Avitabile (Inactive) [ 25/Mar/20 ]

I can reproduce this issue with the following diff. The reconfig in the test is to ensure that the config term gets initialized, so that downgrading FCV needs to remove the config term. (See SERVER-47119 for why this requires running replSetReconfig.)

diff --git a/jstests/core/txns/await_prepared_transactions_on_FCV_downgrade.js b/jstests/core/txns/await_prepared_transactions_on_FCV_downgrade.js
index dd4e8cd21e..441131fb4f 100644
--- a/jstests/core/txns/await_prepared_transactions_on_FCV_downgrade.js
+++ b/jstests/core/txns/await_prepared_transactions_on_FCV_downgrade.js
@@ -19,6 +19,14 @@ assert.commandWorked(testDB.runCommand({create: collName, writeConcern: {w: "maj
 const session = testDB.getMongo().startSession();
 const sessionDB = session.getDatabase(dbName);
 
+var config = testDB.adminCommand({replSetGetConfig: 1}).config;
+config.settings.chainingAllowed = false;
+config.version++;
+assert.commandWorked(testDB.adminCommand({replSetReconfig: config}));
+
+config = testDB.adminCommand({replSetGetConfig: 1}).config;
+assert(config.hasOwnProperty("term"), config);
+
 try {
     jsTestLog("Start a transaction.");
     session.startTransaction();
diff --git a/src/mongo/db/commands/set_feature_compatibility_version_command.cpp b/src/mongo/db/commands/set_feature_compatibility_version_command.cpp
index efe1c054aa..8927668f21 100644
--- a/src/mongo/db/commands/set_feature_compatibility_version_command.cpp
+++ b/src/mongo/db/commands/set_feature_compatibility_version_command.cpp
@@ -160,9 +160,8 @@ public:
             auto waitForWCStatus = waitForWriteConcern(
                 opCtx,
                 repl::ReplClientInfo::forClient(opCtx->getClient()).getLastOp(),
-                WriteConcernOptions(repl::ReplSetConfig::kMajorityWriteConcernModeName,
-                                    WriteConcernOptions::SyncMode::UNSET,
-                                    timeout),
+                WriteConcernOptions(
+                    WriteConcernOptions::kMajority, WriteConcernOptions::SyncMode::UNSET, timeout),
                 &res);
             CommandHelpers::appendCommandWCStatus(result, waitForWCStatus, res);
         });
diff --git a/src/mongo/db/repl/replication_coordinator_impl.cpp b/src/mongo/db/repl/replication_coordinator_impl.cpp
index e9eb073f8e..aabf5743d8 100644
--- a/src/mongo/db/repl/replication_coordinator_impl.cpp
+++ b/src/mongo/db/repl/replication_coordinator_impl.cpp
@@ -3447,12 +3448,12 @@ void ReplicationCoordinatorImpl::_finishReplSetReconfig(OperationContext* opCtx,
     //
     // If the new config has the same content but different version and term, skip it, since
     // the quorum condition is still the same.
-    auto newConfigCopy = newConfig;
+    /*auto newConfigCopy = newConfig;
     newConfigCopy.setConfigTerm(oldConfig.getConfigTerm());
     newConfigCopy.setConfigVersion(oldConfig.getConfigVersion());
-    if (SimpleBSONObjComparator::kInstance.evaluate(oldConfig.toBSON() != newConfigCopy.toBSON())) {
+    if (SimpleBSONObjComparator::kInstance.evaluate(oldConfig.toBSON() != newConfigCopy.toBSON())) {*/
         _dropAllSnapshots_inlock();
-    }
+    //}
 
     lk.unlock();
     _performPostMemberStateUpdateAction(action);

Comment by Tess Avitabile (Inactive) [ 25/Mar/20 ]

Ah, my guess is that it's because we don't need to do a reconfig on downgrade FCV if the term is uninitialized, and SERVER-46571 makes the initial replica set config have uninitialized term. So I expect the underlying problem still exists.

Comment by Tess Avitabile (Inactive) [ 25/Mar/20 ]

The issue was resolved by SERVER-46571. I will look into why.

Comment by Tess Avitabile (Inactive) [ 25/Mar/20 ]

Oddly, making the change in the Steps to Reproduce on the tip of master does not reproduce the bug, but if I reset back to this commit then make the change, the problem does reproduce. siyuan.zhou, do you know of any commits in the meantime that might have affected this behavior?

Generated at Thu Feb 08 05:11:43 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.