Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-42219

Oplog buffer not always empty when primary exits drain mode

    • Type: Icon: Bug Bug
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • 4.2.1, 4.3.1, 4.0.17
    • Affects Version/s: 4.2.0-rc2
    • Component/s: Replication
    • Labels:
      None
    • Fully Compatible
    • ALL
    • v4.2, v4.0, v3.6
    • Hide
      1. Add a failpoint to sync_tail.cpp and if it is set, sleep for a second after this line.
      2. Set the failpoint in drain.js after setting the rsSyncApply failpoint.
      3. Run jstests/replsets/drain.js
      Show
      Add a failpoint to sync_tail.cpp and if it is set, sleep for a second  after this line . Set the failpoint in drain.js after setting the rsSyncApply failpoint . Run jstests/replsets/drain.js
    • Repl 2019-08-12, Repl 2019-08-26, Repl 2019-09-09
    • 68

      If a new primary is in drain mode and the thread getting the next batch from the oplog buffer is slow to run, then it can exit drain mode prematurely here because it didn't get a new batch after 1 second. This is problematic because the oplog buffer could still have oplog entries for the node to apply. Once the node exits drain mode, it will write an oplog entry in the new term. Since we don't stop the thread running oplog application when we exit drain mode, it could then get a new batch of oplog entries that are before the new term oplog entry. When it tries to apply them, it will lead to this fassert because we cannot apply oplog entries that are before our lastApplied.  

            Assignee:
            siyuan.zhou@mongodb.com Siyuan Zhou
            Reporter:
            samy.lanka@mongodb.com Samyukta Lanka
            Votes:
            0 Vote for this issue
            Watchers:
            12 Start watching this issue

              Created:
              Updated:
              Resolved: