Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-34211

A failed restartCatalog command can clear the cached repl oplog pointer without reestablishing it

    XMLWordPrintable

Details

    • Bug
    • Status: Closed
    • Major - P3
    • Resolution: Fixed
    • None
    • 4.0.0-rc0
    • Replication, Storage
    • None
    • Fully Compatible
    • ALL
    • 0

    Description

      Imagine this sequence of events:

      1. I run a background index build (or any background job, really) on the namespace "test.coll".
      2. Someone issues a restartCatalog command.
      3. We close all of the open databases via DBHolder::closeAll(). This simply loops through each database and attempts to close it. Suppose the order of databases is "local", then "test".
        1. Database "local" is closed. The cached oplog collection pointer is cleared.
        2. We attempt to close database "test" but then throw because a background operation is in progress.
      4. A later operation causes us to write to the oplog, and we dereference our bad oplog pointer because logOp() does not call acquireOplogCollectionForLogging().

      One solution would be to add a ScopeGuard to restartCatalog that calls repl::acquireOplogCollectionForLogging() if the call to catalog::closeCatalog() fails for any reason.

      Attachments

        Activity

          People

            kyle.suarez@mongodb.com Kyle Suarez
            kyle.suarez@mongodb.com Kyle Suarez
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: