[SERVER-43978] Stable timestamp is not being recalculated after aborting oplog holes Created: 12/Oct/19  Updated: 29/Oct/23  Resolved: 12/Nov/19

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: 4.3.1, 4.2.3

Type: Bug Priority: Major - P3
Reporter: Lingzhi Deng Assignee: William Schultz (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Related
related to SERVER-39199 Committing or aborting a prepared tra... Closed
related to SERVER-40809 fix circular dependency between db/re... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v4.2
Steps To Reproduce:

+++ b/jstests/replsets/stable_timestamp_oplog_hole.js
@@ -0,0 +1,68 @@
+/**
+ * Test that stable timestamp can advance after a oplog hole is released via an abort.
+ *
+ * @tags: [requires_persistence, uses_transactions]
+ */
+
+(function() {
+"use strict";
+load("jstests/libs/check_log.js");
+load("jstests/core/txns/libs/prepare_helpers.js");
+
+const replTest = new ReplSetTest({nodes: 1});
+replTest.startSet();
+replTest.initiate();
+
+let primary = replTest.getPrimary();
+
+const dbName = "test";
+const collName = "stable_timestamp_oplog_hole";
+let testDB = primary.getDB(dbName);
+const testColl = testDB.getCollection(collName);
+
+assert.commandWorked(testColl.insert({_id: 1}));
+
+assert.commandWorked(testDB.adminCommand(
+    {configureFailPoint: 'hangAndFailUnpreparedCommitAfterReservingOplogSlot', mode: 'alwaysOn'}));
+
+// Run a transaction in a parallel shell.
+function transactionFn() {
+    const name = 'stable_timestamp_oplog_hole';
+    const dbName = 'test';
+    const collName = name;
+    const session = db.getMongo().startSession();
+    const sessionDB = session.getDatabase(dbName);
+
+    session.startTransaction();
+    sessionDB[collName].update({_id: 1}, {_id: 1, a: 1});
+    assert.commandFailedWithCode(session.commitTransaction_forTesting(), 51260);
+}
+const joinTransaction = startParallelShell(transactionFn, replTest.ports[0]);
+
+jsTestLog("Waiting to hang unprepared commit after reserving oplog slot.");
+checkLog.contains(primary, "hangAndFailUnpreparedCommitAfterReservingOplogSlot fail point enabled");
+
+// Run a write with {w: "majority"} in a parallel shell.
+function majorityWriteFn() {
+    const dbName = 'test';
+    const collName = 'stable_timestamp_oplog_hole';
+    const testDB = db.getMongo().getDB(dbName);
+    const testColl = testDB.getCollection(collName);
+
+    assert.commandWorked(testColl.insert({_id: 2}, {writeConcern: {w: "majority", wtimeout: 20000}}));
+}
+const joinMajorityWrite = startParallelShell(majorityWriteFn, replTest.ports[0]);
+
+jsTestLog("Waiting for majority write to advance lastApplied.");
+sleep(5 * 1000);
+assert.commandWorked(testDB.adminCommand(
+    {configureFailPoint: 'hangAndFailUnpreparedCommitAfterReservingOplogSlot', mode: 'off'}));
+
+jsTestLog("Joining the transaction.");
+joinTransaction();
+
+jsTestLog("Joining the majority write.");
+joinMajorityWrite();
+
+replTest.stopSet();
+}());
diff --git a/src/mongo/db/op_observer_impl.cpp b/src/mongo/db/op_observer_impl.cpp
index 73d31fd799..e4270bd907 100644
--- a/src/mongo/db/op_observer_impl.cpp
+++ b/src/mongo/db/op_observer_impl.cpp
@@ -27,6 +27,8 @@
  *    it in the license file.
  */
 
+#define MONGO_LOG_DEFAULT_COMPONENT ::mongo::logger::LogComponent::kReplication
+
 #include "mongo/platform/basic.h"
 
 #include "mongo/db/op_observer_impl.h"
@@ -62,6 +64,7 @@
 #include "mongo/scripting/engine.h"
 #include "mongo/util/assert_util.h"
 #include "mongo/util/fail_point.h"
+#include "mongo/util/log.h"
 
 namespace mongo {
 using repl::MutableOplogEntry;
@@ -70,6 +73,7 @@ using repl::OplogEntry;
 namespace {
 
 MONGO_FAIL_POINT_DEFINE(failCollectionUpdates);
+MONGO_FAIL_POINT_DEFINE(hangAndFailUnpreparedCommitAfterReservingOplogSlot);
 
 const auto documentKeyDecoration = OperationContext::declareDecoration<BSONObj>();
 
@@ -1009,6 +1013,14 @@ void OpObserverImpl::onUnpreparedTransactionCommit(
     auto oplogSlots = repl::getNextOpTimes(opCtx, statements.size());
     invariant(oplogSlots.size() == statements.size());
 
+    if (MONGO_unlikely(hangAndFailUnpreparedCommitAfterReservingOplogSlot.shouldFail())) {
+        log() << "hangAndFailUnpreparedCommitAfterReservingOplogSlot fail point enabled.";
+        hangAndFailUnpreparedCommitAfterReservingOplogSlot.pauseWhileSet(opCtx);
+        uasserted(51260,
+                  str::stream()
+                      << "hangAndFailUnpreparedCommitAfterReservingOplogSlot fail point enabled");
+    }
+
     // Log in-progress entries for the transaction along with the implicit commit.
     int numOplogEntries = logOplogEntriesForTransaction(opCtx, statements, oplogSlots, false);
     commitOpTime = oplogSlots[numOplogEntries - 1];

Sprint: Repl 2019-11-04, Repl 2019-11-18
Participants:
Linked BF Score: 7

 Description   

If we reserve an oplog slot but decide to abort later, the stable timestamp is not being recalculated even though the oplog hole has been released. This is similar to the bug described in SERVER-39199 where we fixed OplogSlotReserver (used for prepared transactions) to recalculate the stable timestamp in its destructor. But this is still a problem whenever we reserve oplog slots but have chances to abort later. For example, onUnpreparedTransactionCommit, createCollection and other callers of getNextOpTime.

(Note: commit seems fine because we always set my lastApplied in _logOpsInner, which will update stable timestamp).



 Comments   
Comment by Githook User [ 06/Jan/20 ]

Author:

{'name': 'William Schultz', 'email': 'william.schultz@mongodb.com', 'username': 'will62794'}

Message: SERVER-43978 Blacklist stable_timestamp_can_advance_after_oplog_hole_abort.js from multiversion suites
Branch: master
https://github.com/mongodb/mongo/commit/29c8868b7c1c79ca49341ee68d9e517a5c88f38f

Comment by Githook User [ 06/Jan/20 ]

Author:

{'name': 'William Schultz', 'email': 'william.schultz@mongodb.com', 'username': 'will62794'}

Message: SERVER-43978 Allow stable timestamp to advance after a transaction that reserved an oplog timestamp is aborted

(cherry picked from commit 754c07c70cf5cd3c6760683bc29c927010a5718a)
Branch: v4.2
https://github.com/mongodb/mongo/commit/e62de66b40d490267506ef89b081dcca3c33b508

Comment by Githook User [ 12/Nov/19 ]

Author:

{'name': 'William Schultz', 'username': 'will62794', 'email': 'william.schultz@mongodb.com'}

Message: SERVER-43978 Allow stable timestamp to advance after a transaction that reserved an oplog timestamp is aborted
Branch: master
https://github.com/mongodb/mongo/commit/754c07c70cf5cd3c6760683bc29c927010a5718a

Comment by William Schultz (Inactive) [ 07/Nov/19 ]

lingzhi.deng raised the question of whether the fix for this ticket may make the changes from SERVER-39199 obsolete. This may be something worth investigating if it is an easy code simplification/removal.

Comment by William Schultz (Inactive) [ 22/Oct/19 ]

From a quick search against this commit, the existing callers of repl::getNextOpTimes() are:

Function
    getNextOpTimes(OperationContext *, size_t) : vector<OplogSlot>
Found usages  (3 usages found)
    Production  (3 usages found)
        Value read  (3 usages found)
            mongo  (3 usages found)
                src/mongo/db  (1 usage found)
                    op_observer_impl.cpp  (1 usage found)
                        onUnpreparedTransactionCommit  (1 usage found)
                            1013 auto oplogSlots = repl::getNextOpTimes(opCtx, statements.size());
                src/mongo/db/ops  (1 usage found)
                    write_ops_exec.cpp  (1 usage found)
                        insertDocuments  (1 usage found)
                            312 auto oplogSlots = repl::getNextOpTimes(opCtx, batchSize);
                src/mongo/db/repl  (1 usage found)
                    oplog.h  (1 usage found)
                        getNextOpTime  (1 usage found)
                            254 auto slots = getNextOpTimes(opCtx, 1);

and the callers of repl::getNextOpTime:

Function
    getNextOpTime(OperationContext *) : OplogSlot
Found usages  (2 usages found)
    Production  (2 usages found)
        Value read  (2 usages found)
            mongo  (2 usages found)
                src/mongo/db/catalog  (2 usages found)
                    collection_impl.cpp  (1 usage found)
                        insertDocumentForBulkLoader  (1 usage found)
                            478 slot = repl::getNextOpTime(opCtx);
                    database_impl.cpp  (1 usage found)
                        createCollection  (1 usage found)
                            623 createOplogSlot = repl::getNextOpTime(opCtx);

Comment by Judah Schvimer [ 14/Oct/19 ]

In practice, the noop writer may make this unlikely, but we'll fix it to fix the BFs.

Generated at Thu Feb 08 05:04:39 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.