Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-57229

killOp_against_journal_flusher_thread.js must ensure the JournalFlusher doesn't reset the opCtx between finding the opId and running killOp

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 5.0.0-rc1
    • Component/s: None
    • Labels:
      None
    • Backwards Compatibility:
      Fully Compatible
    • Operating System:
      ALL
    • Backport Requested:
      v5.0
    • Sprint:
      Execution Team 2021-06-14
    • Linked BF Score:
      23

      Description

      it's possible for the JournalFlusher to miss the killOp interrupt by timing the opCtx reset right: the killOp marks the JournalFlusher's opCtx killed, but then the JournalFlusher resets the opCtx and never throws the expected error.

      The opId that the test fetches via currentOp is associated with the JournalFlusher's opCtx at that moment, and then the opCtx has changed by the time that the test tries to kill the journal flusher thread via killOp. It's a small window of time.

      The test sets the JournalFlusher interval (how frequently it runs) to 500 ms. We could decrease the frequency (higher interval), but then we also need the run the JournalFlusher to run in order to get that error thrown.

      I recommend a new FAILPOINT, to stop the JournalFlusher before the currentOp and then release it after the killOp is sent.

        Attachments

          Activity

            People

            Assignee:
            dianna.hohensee Dianna Hohensee
            Reporter:
            dianna.hohensee Dianna Hohensee
            Participants:
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Dates

              Created:
              Updated:
              Resolved: