Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-34211

A failed restartCatalog command can clear the cached repl oplog pointer without reestablishing it

    • Type: Icon: Bug Bug
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • 4.0.0-rc0
    • Affects Version/s: None
    • Component/s: Replication, Storage
    • Labels:
      None
    • Fully Compatible
    • ALL
    • 0

      Imagine this sequence of events:

      1. I run a background index build (or any background job, really) on the namespace "test.coll".
      2. Someone issues a restartCatalog command.
      3. We close all of the open databases via DBHolder::closeAll(). This simply loops through each database and attempts to close it. Suppose the order of databases is "local", then "test".
        1. Database "local" is closed. The cached oplog collection pointer is cleared.
        2. We attempt to close database "test" but then throw because a background operation is in progress.
      4. A later operation causes us to write to the oplog, and we dereference our bad oplog pointer because logOp() does not call acquireOplogCollectionForLogging().

      One solution would be to add a ScopeGuard to restartCatalog that calls repl::acquireOplogCollectionForLogging() if the call to catalog::closeCatalog() fails for any reason.

            Assignee:
            kyle.suarez@mongodb.com Kyle Suarez
            Reporter:
            kyle.suarez@mongodb.com Kyle Suarez
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated:
              Resolved: