Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-38928

background dbhash test hook incorrectly relies on a primary's lastApplied being >= its secondary's lastApplied

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.0.6, 4.1.7
    • Component/s: Storage
    • Labels:
      None
    • Backwards Compatibility:
      Fully Compatible
    • Operating System:
      ALL
    • Backport Requested:
      v4.0
    • Steps To Reproduce:
      Hide

      diff --git a/src/mongo/db/repl/oplog.cpp b/src/mongo/db/repl/oplog.cpp
      index 0e244b9a0d..5bb1ec39e4 100644
      --- a/src/mongo/db/repl/oplog.cpp
      +++ b/src/mongo/db/repl/oplog.cpp
      @@ -416,7 +416,7 @@ void _logOpsInner(OperationContext* opCtx,
       
           // Set replCoord last optime only after we're sure the WUOW didn't abort and roll back.
           opCtx->recoveryUnit()->onCommit(
      -        [opCtx, replCoord, finalOpTime](boost::optional<Timestamp> commitTime) {
      +        [opCtx, replCoord, finalOpTime, nDocs](boost::optional<Timestamp> commitTime) {
                   if (commitTime) {
                       // The `finalOpTime` may be less than the `commitTime` if multiple oplog entries
                       // are logging within one WriteUnitOfWork.
      @@ -426,7 +426,12 @@ void _logOpsInner(OperationContext* opCtx,
                                               << commitTime->toString());
                   }
       
      +            if (nDocs > 10) {
      +                usleep(10 * 1000 * 1000);
      +            }
      +
                   // Optimes on the primary should always represent consistent database states.
      +            log() << "Moving lastApplied forward. Time: " << finalOpTime;
                   replCoord->setMyLastAppliedOpTimeForward(
                       finalOpTime, ReplicationCoordinator::DataConsistency::Consistent);
       
      

      Show
      diff --git a/src/mongo/db/repl/oplog.cpp b/src/mongo/db/repl/oplog.cpp index 0e244b9a0d..5bb1ec39e4 100644 --- a/src/mongo/db/repl/oplog.cpp +++ b/src/mongo/db/repl/oplog.cpp @@ -416,7 +416,7 @@ void _logOpsInner(OperationContext* opCtx, // Set replCoord last optime only after we're sure the WUOW didn't abort and roll back. opCtx->recoveryUnit()->onCommit( - [opCtx, replCoord, finalOpTime](boost::optional<Timestamp> commitTime) { + [opCtx, replCoord, finalOpTime, nDocs](boost::optional<Timestamp> commitTime) { if (commitTime) { // The `finalOpTime` may be less than the `commitTime` if multiple oplog entries // are logging within one WriteUnitOfWork. @@ -426,7 +426,12 @@ void _logOpsInner(OperationContext* opCtx, << commitTime->toString()); } + if (nDocs > 10) { + usleep(10 * 1000 * 1000); + } + // Optimes on the primary should always represent consistent database states. + log() << "Moving lastApplied forward. Time: " << finalOpTime; replCoord->setMyLastAppliedOpTimeForward( finalOpTime, ReplicationCoordinator::DataConsistency::Consistent);
    • Sprint:
      Storage NYC 2019-01-14
    • Linked BF Score:
      0

      Description

      The run_check_repl_dbhash_background.js test relies on a primary's lastApplied value being greater than or equal to its secondaries. We can observe whether this property is true by:

      1. Perform a read operation on a secondary, record its operationTime
      2. Perform a read operation on the primary, record its operationTime

      In the absence of elections/rollbacks, if the secondary's operation time is greater than the primary, then the property is violated.

      Because reads don't advance the lastApplied and are concurrent with writers, it's possible for a writer to pause before hitting this statement in an onCommit hook. Inside this window, the write is committed to the storage engine and thus visible in the oplog. A secondary can observe/process/commit the oplog entries, then update its own lastApplied.

      run_check_repl_dbhash_background.js expects the primary's operation time to be greater than its secondaries. It uses this property to choose an atClusterTime that is expected to be within the history of all nodes. This fails on tests that turn off majority reads as they will advance their history in step with lastApplied (some caveats apply).

      Notably, this test does not fail when majority reads are on. This is because those configurations use a single voting primary. In those cases the majority commit point (and thus the history secondaries will keep) is in step with the primary's lastApplied.

      This implies another set of dependencies which may not be getting tested thoroughly (though the window is much narrower). Specifically, a secondary may respond to majority reads with more up to date documents than a primary.

        Attachments

          Activity

            People

            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: