Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-42219

Oplog buffer not always empty when primary exits drain mode

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Fixed
    • Affects Version/s: 4.2.0-rc2
    • Fix Version/s: 4.2.1, 4.3.1, 4.0.17
    • Component/s: Replication
    • Labels:
      None
    • Backwards Compatibility:
      Fully Compatible
    • Operating System:
      ALL
    • Backport Requested:
      v4.2, v4.0, v3.6
    • Steps To Reproduce:
      Hide
      1. Add a failpoint to sync_tail.cpp and if it is set, sleep for a second after this line.
      2. Set the failpoint in drain.js after setting the rsSyncApply failpoint.
      3. Run jstests/replsets/drain.js
      Show
      Add a failpoint to sync_tail.cpp and if it is set, sleep for a second  after this line . Set the failpoint in drain.js after setting the rsSyncApply failpoint . Run jstests/replsets/drain.js
    • Sprint:
      Repl 2019-08-12, Repl 2019-08-26, Repl 2019-09-09
    • Linked BF Score:
      68

      Description

      If a new primary is in drain mode and the thread getting the next batch from the oplog buffer is slow to run, then it can exit drain mode prematurely here because it didn't get a new batch after 1 second. This is problematic because the oplog buffer could still have oplog entries for the node to apply. Once the node exits drain mode, it will write an oplog entry in the new term. Since we don't stop the thread running oplog application when we exit drain mode, it could then get a new batch of oplog entries that are before the new term oplog entry. When it tries to apply them, it will lead to this fassert because we cannot apply oplog entries that are before our lastApplied.  

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              siyuan.zhou Siyuan Zhou
              Reporter:
              samy.lanka Samyukta Lanka
              Participants:
              Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: