Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-41779

reconstructPreparedTransactions fails to read a prepare oplog entry during initial sync

    • Type: Icon: Bug Bug
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • 4.2.0-rc2, 4.3.1
    • Affects Version/s: None
    • Component/s: Replication
    • Labels:
      None
    • Fully Compatible
    • ALL
    • v4.2
    • Repl 2019-07-01
    • 12

      reconstructPreparedTransactions could fail to read an oplog entry during initial sync when:
      1. The first attempt of initial sync fails after applying some oplog entries, which leaves the localSnapshot pointing to the lastApplied.
      2. The second attempt of initial sync does try to reset all the optimes before starting the new attempt. ReplicationCoordinatorImpl::resetMyLastOpTimes relies on calling ReplicationCoordinatorImpl::_setMyLastAppliedOpTimeAndWallTime to reset the lastApplied and the localSnapshot back to OpTime 0. But ReplicationCoordinatorImpl::_setMyLastAppliedOpTimeAndWallTime skips resetting the localSnapshot if the given OpTime isNull(). Because of this bug, the localSnapshot is still pointing to the last oplog entry applied during the first attempt.
      3. If the second attempt doesn't need to apply any ops after data cloning, it inserts the last oplog entry as the oplog seed document using the timestamp of that oplog entry. In order to trigger the bug, the last oplog entry that inserted as the seed has to be a prepare oplog entry and its OpTime has to be greater than the OpTimes of the oplog entries applied in (1).
      4. After the second attempt successfully finishes, reconstructPreparedTransactions is called to reconstruct outstanding prepared transactions. In this case, it needs to read the oplog seed entry.
      5. reconstructPreparedTransactions uses its own ReadSourceScope but the transactions table read implicitly changes the read source to kLastAppled which is then used by the oplog read.
      6. Because of the bug in (2), oplog read is using the lastApplied timestamp (that was set in (1) but failed to be reset in (2)) that is earlier than the prepare oplog entry and thus fails to read the entry.

      There are two solutions to this:
      1. Fix reconstructPreparedTransactions to explicitly set read source as kNoTimestamp so both the transactions table read and the oplog read would be untimestamped.
      2. Fix ReplicationCoordinatorImpl::_setMyLastAppliedOpTimeAndWallTime to reset localSnapshot properly even if the given OpTime is 0. (i.e. Moving the if statement to after updateLocalSnapshot.

      And I think maybe we should do both.

            Assignee:
            lingzhi.deng@mongodb.com Lingzhi Deng
            Reporter:
            lingzhi.deng@mongodb.com Lingzhi Deng
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

              Created:
              Updated:
              Resolved: