[SERVER-38928] background dbhash test hook incorrectly relies on a primary's lastApplied being >= its secondary's lastApplied Created: 10/Jan/19  Updated: 29/Oct/23  Resolved: 14/Jan/19

Status: Closed
Project: Core Server
Component/s: Storage
Affects Version/s: None
Fix Version/s: 4.0.6, 4.1.7

Type: Bug Priority: Major - P3
Reporter: Daniel Gottlieb (Inactive) Assignee: Daniel Gottlieb (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Related
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v4.0
Steps To Reproduce:

diff --git a/src/mongo/db/repl/oplog.cpp b/src/mongo/db/repl/oplog.cpp
index 0e244b9a0d..5bb1ec39e4 100644
--- a/src/mongo/db/repl/oplog.cpp
+++ b/src/mongo/db/repl/oplog.cpp
@@ -416,7 +416,7 @@ void _logOpsInner(OperationContext* opCtx,
 
     // Set replCoord last optime only after we're sure the WUOW didn't abort and roll back.
     opCtx->recoveryUnit()->onCommit(
-        [opCtx, replCoord, finalOpTime](boost::optional<Timestamp> commitTime) {
+        [opCtx, replCoord, finalOpTime, nDocs](boost::optional<Timestamp> commitTime) {
             if (commitTime) {
                 // The `finalOpTime` may be less than the `commitTime` if multiple oplog entries
                 // are logging within one WriteUnitOfWork.
@@ -426,7 +426,12 @@ void _logOpsInner(OperationContext* opCtx,
                                         << commitTime->toString());
             }
 
+            if (nDocs > 10) {
+                usleep(10 * 1000 * 1000);
+            }
+
             // Optimes on the primary should always represent consistent database states.
+            log() << "Moving lastApplied forward. Time: " << finalOpTime;
             replCoord->setMyLastAppliedOpTimeForward(
                 finalOpTime, ReplicationCoordinator::DataConsistency::Consistent);
 

Sprint: Storage NYC 2019-01-14
Participants:
Linked BF Score: 0

 Description   

The run_check_repl_dbhash_background.js test relies on a primary's lastApplied value being greater than or equal to its secondaries. We can observe whether this property is true by:

  1. Perform a read operation on a secondary, record its operationTime
  2. Perform a read operation on the primary, record its operationTime

In the absence of elections/rollbacks, if the secondary's operation time is greater than the primary, then the property is violated.

Because reads don't advance the lastApplied and are concurrent with writers, it's possible for a writer to pause before hitting this statement in an onCommit hook. Inside this window, the write is committed to the storage engine and thus visible in the oplog. A secondary can observe/process/commit the oplog entries, then update its own lastApplied.

run_check_repl_dbhash_background.js expects the primary's operation time to be greater than its secondaries. It uses this property to choose an atClusterTime that is expected to be within the history of all nodes. This fails on tests that turn off majority reads as they will advance their history in step with lastApplied (some caveats apply).

Notably, this test does not fail when majority reads are on. This is because those configurations use a single voting primary. In those cases the majority commit point (and thus the history secondaries will keep) is in step with the primary's lastApplied.

This implies another set of dependencies which may not be getting tested thoroughly (though the window is much narrower). Specifically, a secondary may respond to majority reads with more up to date documents than a primary.



 Comments   
Comment by Githook User [ 14/Jan/19 ]

Author:

{'username': 'dgottlieb', 'email': 'daniel.gottlieb@mongodb.com', 'name': 'Daniel Gottlieb'}

Message: SERVER-38928: Have run_check_repl_dbhash_background.js cope with secondaries reporting a greater optime than their primary.

(cherry picked from commit a9177f25ba33a938b5d1bfcc7cc9c17443cdad45)
Branch: v4.0
https://github.com/mongodb/mongo/commit/6de0d9f57ad300123b9995846fde17888eadcf18

Comment by Githook User [ 11/Jan/19 ]

Author:

{'username': 'dgottlieb', 'email': 'daniel.gottlieb@mongodb.com', 'name': 'Daniel Gottlieb'}

Message: SERVER-38928: Have run_check_repl_dbhash_background.js cope with secondaries reporting a greater optime than their primary.
Branch: master
https://github.com/mongodb/mongo/commit/a9177f25ba33a938b5d1bfcc7cc9c17443cdad45

Generated at Thu Feb 08 04:50:29 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.