Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-27831

Deadlock when listing collections on "local" database with replication enabled for KVCatalog-based storage engines without document locking

    • Fully Compatible
    • ALL
    • v3.4
    • Hide

      Apply the following patch and run the create_database.js FSM workload against the ephemeralForTest storage engine.

      python buildscripts/resmoke.py --executor=concurrency_replication jstests/concurrency/fsm_all_replication.js --storageEngine=ephemeralForTest
      
      Unable to find source-code formatter for language: diff. Available languages are: actionscript, ada, applescript, bash, c, c#, c++, cpp, css, erlang, go, groovy, haskell, html, java, javascript, js, json, lua, none, nyan, objc, perl, php, python, r, rainbow, ruby, scala, sh, sql, swift, visualbasic, xml, yaml
      diff --git a/jstests/concurrency/fsm_libs/runner.js b/jstests/concurrency/fsm_libs/runner.js
      index ec62924..7e33d1b 100644
      --- a/jstests/concurrency/fsm_libs/runner.js
      +++ b/jstests/concurrency/fsm_libs/runner.js
      @@ -673,6 +673,7 @@ var runner = (function() {
                       bgThreadMgr.checkFailed(0);
      
                       var schedule = scheduleWorkloads(workloads, executionMode, executionOptions);
      +                schedule = [ [ "jstests/concurrency/fsm_workloads/create_database.js" ] ];
                       printWorkloadSchedule(schedule, bgWorkloads);
      
                       schedule.forEach(function(workloads) {
      
      Show
      Apply the following patch and run the create_database.js FSM workload against the ephemeralForTest storage engine. python buildscripts/resmoke.py --executor=concurrency_replication jstests/concurrency/fsm_all_replication.js --storageEngine=ephemeralForTest Unable to find source-code formatter for language: diff. Available languages are: actionscript, ada, applescript, bash, c, c#, c++, cpp, css, erlang, go, groovy, haskell, html, java, javascript, js, json, lua, none, nyan, objc, perl, php, python, r, rainbow, ruby, scala, sh, sql, swift, visualbasic, xml, yaml diff --git a/jstests/concurrency/fsm_libs/runner.js b/jstests/concurrency/fsm_libs/runner.js index ec62924..7e33d1b 100644 --- a/jstests/concurrency/fsm_libs/runner.js +++ b/jstests/concurrency/fsm_libs/runner.js @@ -673,6 +673,7 @@ var runner = (function() { bgThreadMgr.checkFailed(0); var schedule = scheduleWorkloads(workloads, executionMode, executionOptions); + schedule = [ [ "jstests/concurrency/fsm_workloads/create_database.js" ] ]; printWorkloadSchedule(schedule, bgWorkloads); schedule.forEach(function(workloads) {
    • Storage 2017-03-27, Storage 2017-04-17
    • 0

      Storage engines which use the KVCatalog, but do not support document-level concurrency (such as the ephemeralForTest storage engine) may experience deadlock because of the incompatible acquisition order of the resourceIdCatalogMetadata and the "local" database locks. This bug does not apply to the MMAPv1 or WiredTiger storage engines.

      db.getSiblingDB("local").runCommand({listCollections: 1}):
        HOLD: Global, 1, MODE_IS
        HOLD: Global, 2, MODE_IS
        HOLD: Database, local, MODE_S
        WAIT ON: Metadata, resourceIdCatalogMetadata, MODE_S
      
      db.getSiblingDB("CreateDatabase0").runCommand({create: "coll0"}):
        HOLD: Global, 1, MODE_IS
        HOLD: Global, 2, MODE_IX
        HOLD: Database, CreateDatabase0, MODE_X
        HOLD: Metadata, resourceIdCatalogMetadata, MODE_X
        WAIT ON: Database, local, MODE_IX
      

      The resourceIdCatalogMetadata lock is acquired when calling KVCatalog::newCollection() as part of Database::createCollection(); however, the lock doesn't get released immediately after returning from there since it was acquired in MODE_X inside a WriteUnitOfWork causing shouldDelayUnlock() to return true. This leads to the client creating a collection to be holding the resourceIdCatalogMetadata lock while attempting to acquire a lock on the "local" database in order to write the corresponding oplog entry at the same time as the client listing collections on the "local" database to be holding a lock on the "local" database while attempting to acquire the resourceIdCatalogMetadata lock to get the individual collection options.


      git version: ae04822985f2478c7da1e6821f5fc91b484b9555

            Assignee:
            daniel.gottlieb@mongodb.com Daniel Gottlieb (Inactive)
            Reporter:
            max.hirschhorn@mongodb.com Max Hirschhorn
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated:
              Resolved: