Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-35671

DatabaseHolderImpl::closeAll can leave catalog in an incomplete state

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.0.1, 4.1.1
    • Component/s: Storage
    • Labels:
      None
    • Backwards Compatibility:
      Fully Compatible
    • Operating System:
      ALL
    • Backport Requested:
      v4.0
    • Steps To Reproduce:
      Hide

      Running a replica set with --setParameter enableTestCommands=true:

      Shell 1:

      function setFailpointBool(failpointName, alwaysOn, times) {
          if (times) {
              return db.adminCommand({configureFailPoint: failpointName, mode: {"times": times}});
          } else if (alwaysOn) {
              return db.adminCommand({configureFailPoint: failpointName, mode: "alwaysOn"});
          } else {
              return db.adminCommand({configureFailPoint: failpointName, mode: "off"});
          }
      }
      rs.initiate()
      setFailpointBool("hangAfterStartingIndexBuildUnlocked", true)
      db.coll.ensureIndex({a: 1, b: 1}, {background: true}) // Blocks because of failpoint
      

      Shell 2:

      db.adminCommand({restartCatalog: 1}) // fails with message: cannot perform operation: a background operation is currently running for database test
      db.adminCommand({restartCatalog: 1})
      

      Show
      Running a replica set with --setParameter enableTestCommands=true : Shell 1: function setFailpointBool(failpointName, alwaysOn, times) { if (times) { return db.adminCommand({configureFailPoint: failpointName, mode: {"times": times}}); } else if (alwaysOn) { return db.adminCommand({configureFailPoint: failpointName, mode: "alwaysOn"}); } else { return db.adminCommand({configureFailPoint: failpointName, mode: "off"}); } } rs.initiate() setFailpointBool("hangAfterStartingIndexBuildUnlocked", true) db.coll.ensureIndex({a: 1, b: 1}, {background: true}) // Blocks because of failpoint Shell 2: db.adminCommand({restartCatalog: 1}) // fails with message: cannot perform operation: a background operation is currently running for database test db.adminCommand({restartCatalog: 1})
    • Sprint:
      Storage NYC 2018-07-02, Storage NYC 2018-07-16
    • Linked BF Score:
      60

      Description

      As `closeAll` iterates and closes each individual database, it checks that the database has no background operations. If that check fails, a uassert is thrown. That uassert reaching the user implies the database is left in a state that is partially closed. In particular the databases returned by `StorageEngine::listDatabases` may not exist in a call to `DatabaseHolder::get`.

      Suggested fix: Move the `BackgroundOperation::assertNoBgOpInProgForDb` call to the previous scan that grabs the database names to close. I.e: assert the complete precondition before taking any actions.

        Attachments

          Activity

            People

            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: