Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-34590

oplog visibility issues with round_to_oldest

    • Type: Icon: Bug Bug
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • 4.0.0-rc0
    • Affects Version/s: 3.7.5
    • Component/s: Storage, WiredTiger
    • Labels:
    • Storage Execution
    • Fully Compatible
    • ALL
    • Storage NYC 2018-05-07
    • 72

      A frequent build failure has been identified since SERVER-34192 Secondary reads during batch application that causes "dbhash mismatch" errors consistently in master which result in missing documents on secondaries.

      When SERVER-32876 Don't stall FTDC due to WT cache full is reverted, the errors go away. This patch was reverted in 3.6 because it caused dbhash mismatch errors.

      The current belief is that the previous synchronization in WiredTigerSnapshotManager preventing opening transactions concurrently was removed. However, this change should be correct without data inconsistency issues.

      Now that the synchronization for opening transactions on the oplog is gone, we believe there is a latent bug exposed that is preventing this concurrent behavior of opening transactions and then subsequently setting a read timestamp on them.

      The diff I believe is responsible for this failure:

       void WiredTigerSnapshotManager::beginTransactionOnOplog(WiredTigerOplogManager* oplogManager,
                                                               WT_SESSION* session) const {
           invariantWTOK(session->begin_transaction(session, nullptr));
           auto rollbacker =
               MakeGuard([&] { invariant(session->rollback_transaction(session, nullptr) == 0); });
      
      -    stdx::lock_guard<stdx::mutex> lock(_mutex);
           auto allCommittedTimestamp = oplogManager->getOplogReadTimestamp();
           invariant(Timestamp(static_cast<unsigned long long>(allCommittedTimestamp)).asULL() ==
                     allCommittedTimestamp);
           auto status = setTransactionReadTimestamp(
      -        Timestamp(static_cast<unsigned long long>(allCommittedTimestamp)), session);
      +        Timestamp(static_cast<unsigned long long>(allCommittedTimestamp)),
      +        session,
      +        true /* roundToOldest */);
      
      -    // If we failed to set the read timestamp, we assume it is due to the oldest_timestamp racing
      -    // ahead.  Rather than synchronizing for this rare case, if requested, throw a
      -    // WriteConflictException which will be retried.
      -    if (!status.isOK() && status.code() == ErrorCodes::BadValue) {
      -        throw WriteConflictException();
      -    }
           fassert(50771, status);
           rollbacker.Dismiss();
      }
      

            Assignee:
            backlog-server-execution [DO NOT USE] Backlog - Storage Execution Team
            Reporter:
            louis.williams@mongodb.com Louis Williams
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

              Created:
              Updated:
              Resolved: