Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-57229

killOp_against_journal_flusher_thread.js must ensure the JournalFlusher doesn't reset the opCtx between finding the opId and running killOp

    • Type: Icon: Bug Bug
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • 5.0.4, 5.1.0-rc0
    • Affects Version/s: None
    • Component/s: None
    • Labels:
    • Fully Compatible
    • ALL
    • v5.0
    • Execution Team 2021-06-14
    • 23

      it's possible for the JournalFlusher to miss the killOp interrupt by timing the opCtx reset right: the killOp marks the JournalFlusher's opCtx killed, but then the JournalFlusher resets the opCtx and never throws the expected error.

      The opId that the test fetches via currentOp is associated with the JournalFlusher's opCtx at that moment, and then the opCtx has changed by the time that the test tries to kill the journal flusher thread via killOp. It's a small window of time.

      The test sets the JournalFlusher interval (how frequently it runs) to 500 ms. We could decrease the frequency (higher interval), but then we also need the run the JournalFlusher to run in order to get that error thrown.

      I recommend a new FAILPOINT, to stop the JournalFlusher before the currentOp and then release it after the killOp is sent.

            dianna.hohensee@mongodb.com Dianna Hohensee (Inactive)
            dianna.hohensee@mongodb.com Dianna Hohensee (Inactive)
            0 Vote for this issue
            2 Start watching this issue