Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-32574

Repairing the local database can cause the WT oplog manager thread to permanently exit.

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.6.4, 3.7.2
    • Component/s: Storage
    • Labels:
      None
    • Backwards Compatibility:
      Fully Compatible
    • Operating System:
      ALL
    • Backport Requested:
      v3.6
    • Sprint:
      Storage 2018-01-29
    • Linked BF Score:
      0

      Description

      The WiredTigerOplogManager's purpose is to advance oplog visibility. Without it, a node can stop responding to requests. The WiredTigerOplogManager will only be constructed once, its lifetime bound by the WiredTigerKVEngine. However the background thread that manages oplog visibility starts and stops as the oplog is created and destroyed. I'm not justifying these details, just stating them,

      There was an assumption that the oplog manager would never be restarted in production. That assumption was not true. Repairing the local database will recreate the oplog record store. When the record store is destroyed, the background thread is halted. When the record store is recreated, the thread is restarted. However, restarting has a race. The thread may start and read _shuttingDown to be true before the spawning thread resets it back to false. This will cause the newly spawned thread to incorrectly shutdown.

        Attachments

          Activity

            People

            Assignee:
            geert.bosch Geert Bosch
            Reporter:
            daniel.gottlieb Daniel Gottlieb
            Participants:
            Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

              Dates

              Created:
              Updated:
              Resolved: