Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Fixed
Priority: Major - P3
Fix Version/s: 4.0.6, 4.1.7
Affects Version/s: None
Component/s: Storage
Labels:
None

Backwards Compatibility:
Fully Compatible
Operating System:
ALL
Backport Requested:

v4.0
Steps To Reproduce:
Hide

diff --git a/src/mongo/db/repl/oplog.cpp b/src/mongo/db/repl/oplog.cpp index 0e244b9a0d..5bb1ec39e4 100644 --- a/src/mongo/db/repl/oplog.cpp +++ b/src/mongo/db/repl/oplog.cpp @@ -416,7 +416,7 @@ void _logOpsInner(OperationContext* opCtx, // Set replCoord last optime only after we're sure the WUOW didn't abort and roll back. opCtx->recoveryUnit()->onCommit( - [opCtx, replCoord, finalOpTime](boost::optional<Timestamp> commitTime) { + [opCtx, replCoord, finalOpTime, nDocs](boost::optional<Timestamp> commitTime) { if (commitTime) { // The `finalOpTime` may be less than the `commitTime` if multiple oplog entries // are logging within one WriteUnitOfWork. @@ -426,7 +426,12 @@ void _logOpsInner(OperationContext* opCtx, << commitTime->toString()); } + if (nDocs > 10) { + usleep(10 * 1000 * 1000); + } + // Optimes on the primary should always represent consistent database states. + log() << "Moving lastApplied forward. Time: " << finalOpTime; replCoord->setMyLastAppliedOpTimeForward( finalOpTime, ReplicationCoordinator::DataConsistency::Consistent);
Show
diff --git a/src/mongo/db/repl/oplog.cpp b/src/mongo/db/repl/oplog.cpp index 0e244b9a0d..5bb1ec39e4 100644 --- a/src/mongo/db/repl/oplog.cpp +++ b/src/mongo/db/repl/oplog.cpp @@ -416,7 +416,7 @@ void _logOpsInner(OperationContext* opCtx, // Set replCoord last optime only after we're sure the WUOW didn't abort and roll back. opCtx->recoveryUnit()->onCommit( - [opCtx, replCoord, finalOpTime](boost::optional<Timestamp> commitTime) { + [opCtx, replCoord, finalOpTime, nDocs](boost::optional<Timestamp> commitTime) { if (commitTime) { // The `finalOpTime` may be less than the `commitTime` if multiple oplog entries // are logging within one WriteUnitOfWork. @@ -426,7 +426,12 @@ void _logOpsInner(OperationContext* opCtx, << commitTime->toString()); } + if (nDocs > 10) { + usleep(10 * 1000 * 1000); + } + // Optimes on the primary should always represent consistent database states. + log() << "Moving lastApplied forward. Time: " << finalOpTime; replCoord->setMyLastAppliedOpTimeForward( finalOpTime, ReplicationCoordinator::DataConsistency::Consistent);
Sprint:
Storage NYC 2019-01-14
Linked BF Score:
0
Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

The run_check_repl_dbhash_background.js test relies on a primary's lastApplied value being greater than or equal to its secondaries. We can observe whether this property is true by:

Perform a read operation on a secondary, record its operationTime
Perform a read operation on the primary, record its operationTime

In the absence of elections/rollbacks, if the secondary's operation time is greater than the primary, then the property is violated.

Because reads don't advance the lastApplied and are concurrent with writers, it's possible for a writer to pause before hitting this statement in an onCommit hook. Inside this window, the write is committed to the storage engine and thus visible in the oplog. A secondary can observe/process/commit the oplog entries, then update its own lastApplied.

run_check_repl_dbhash_background.js expects the primary's operation time to be greater than its secondaries. It uses this property to choose an atClusterTime that is expected to be within the history of all nodes. This fails on tests that turn off majority reads as they will advance their history in step with lastApplied (some caveats apply).

Notably, this test does not fail when majority reads are on. This is because those configurations use a single voting primary. In those cases the majority commit point (and thus the history secondaries will keep) is in step with the primary's lastApplied.

This implies another set of dependencies which may not be getting tested thoroughly (though the window is much narrower). Specifically, a secondary may respond to majority reads with more up to date documents than a primary.

Assignee:: Daniel Gottlieb (Inactive)
Reporter:: Daniel Gottlieb (Inactive)
Participants:: Daniel Gottlieb, Githook User
Votes:: 0 Vote for this issue
Watchers:: 2 Start watching this issue

Created:: Jan 10 2019 04:41:26 PM UTC
Updated:: Oct 29 2023 10:25:16 PM UTC
Resolved:: Jan 14 2019 04:05:01 PM UTC

Details

Description

Attachments

Forms

Activity

People

Dates