Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-45060

Operations can use a Collection without having a storage snapshot where that collection is visible

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Duplicate
    • Affects Version/s: 4.3.2
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None
    • Operating System:
      ALL
    • Steps To Reproduce:
      Hide

      Using this failpoint:

      diff --git a/src/mongo/db/catalog/uncommitted_collections.cpp b/src/mongo/db/catalog/uncommitted_collections.cpp
      index db9a6e3c97..b48b88e9d9 100644
      --- a/src/mongo/db/catalog/uncommitted_collections.cpp
      +++ b/src/mongo/db/catalog/uncommitted_collections.cpp
      @@ -34,9 +34,14 @@
       #include "mongo/db/catalog/collection_catalog.h"
       #include "mongo/db/catalog/uncommitted_collections.h"
       #include "mongo/db/storage/durable_catalog.h"
      +#include "mongo/logv2/log.h"
       #include "mongo/util/assert_util.h"
      +#include "mongo/util/fail_point.h"
       
       namespace mongo {
      +
      +MONGO_FAIL_POINT_DEFINE(hangAfterRegisteringCollection);
      +
       namespace {
       const auto getUncommittedCollections =
           OperationContext::declareDecoration<UncommittedCollections>();
      @@ -69,9 +74,20 @@ void UncommittedCollections::addToTxn(OperationContext* opCtx,
       
       
           opCtx->recoveryUnit()->registerPreCommitHook(
      -        [collListUnowned, uuid, createTime](OperationContext* opCtx) {
      +        [collListUnowned, nss, uuid, createTime](OperationContext* opCtx) {
                   UncommittedCollections::commit(opCtx, uuid, createTime, collListUnowned.lock().get());
      +
      +            hangAfterRegisteringCollection.executeIf(
      +                [&](const BSONObj& data) {
      +                    LOGV2(46156012, "hanging after registering collection.", "nss"_attr = nss);
      +                    hangAfterRegisteringCollection.pauseWhileSet(opCtx);
      +                },
      +                [&](const BSONObj& data) {
      +                    auto collElem = data["collection"];
      +                    return !collElem || collElem.str() == nss.ns();
      +                });
               });
      +
           opCtx->recoveryUnit()->onCommit(
               [collListUnowned, collPtr, createTime](boost::optional<Timestamp> commitTs) {
                   // Verify that the collection was given a minVisibleTimestamp equal to the transactions
      

      This no_passthrough test fails:

      (function() {
      "use strict";
       
      load("jstests/libs/fail_point_util.js");
       
      const replSet = new ReplSetTest({nodes: 1});
      replSet.startSet();
      replSet.initiate();
       
      const primary = replSet.getPrimary();
      const primaryDB = primary.getDB("test");
      const primaryColl = primaryDB.getCollection("coll");
       
      // Set failpoint
      let failPoint =
          configureFailPoint(primaryDB, "hangAfterRegisteringCollection", {collection: "test.coll"});
       
      // Implicitly create collection. This will hang on the failpoint.
      let awaitCreate = startParallelShell(function() {
          assert.commandWorked(db.getMongo().getCollection('test.coll').insert({a: 1}));
      }, primary.port);
       
      // Wait for failpoint to hit.
      failPoint.wait();
       
      // Should fail
      primaryColl.createIndex({a: 1});
       
      failPoint.off();
       
      awaitCreate();
      replSet.stopSet()
      })();
      

      Show
      Using this failpoint: diff --git a/src/mongo/db/catalog/uncommitted_collections.cpp b/src/mongo/db/catalog/uncommitted_collections.cpp index db9a6e3c97..b48b88e9d9 100644 --- a/src/mongo/db/catalog/uncommitted_collections.cpp +++ b/src/mongo/db/catalog/uncommitted_collections.cpp @@ -34,9 +34,14 @@ #include "mongo/db/catalog/collection_catalog.h" #include "mongo/db/catalog/uncommitted_collections.h" #include "mongo/db/storage/durable_catalog.h" +#include "mongo/logv2/log.h" #include "mongo/util/assert_util.h" +#include "mongo/util/fail_point.h" namespace mongo { + +MONGO_FAIL_POINT_DEFINE(hangAfterRegisteringCollection); + namespace { const auto getUncommittedCollections = OperationContext::declareDecoration<UncommittedCollections>(); @@ -69,9 +74,20 @@ void UncommittedCollections::addToTxn(OperationContext* opCtx, opCtx->recoveryUnit()->registerPreCommitHook( - [collListUnowned, uuid, createTime](OperationContext* opCtx) { + [collListUnowned, nss, uuid, createTime](OperationContext* opCtx) { UncommittedCollections::commit(opCtx, uuid, createTime, collListUnowned.lock().get()); + + hangAfterRegisteringCollection.executeIf( + [&](const BSONObj& data) { + LOGV2(46156012, "hanging after registering collection.", "nss"_attr = nss); + hangAfterRegisteringCollection.pauseWhileSet(opCtx); + }, + [&](const BSONObj& data) { + auto collElem = data["collection"]; + return !collElem || collElem.str() == nss.ns(); + }); }); + opCtx->recoveryUnit()->onCommit( [collListUnowned, collPtr, createTime](boost::optional<Timestamp> commitTs) { // Verify that the collection was given a minVisibleTimestamp equal to the transactions This no_passthrough test fails: ( function () { "use strict" ;   load( "jstests/libs/fail_point_util.js" );   const replSet = new ReplSetTest({nodes: 1}); replSet.startSet(); replSet.initiate();   const primary = replSet.getPrimary(); const primaryDB = primary.getDB( "test" ); const primaryColl = primaryDB.getCollection( "coll" );   // Set failpoint let failPoint = configureFailPoint(primaryDB, "hangAfterRegisteringCollection" , {collection: "test.coll" });   // Implicitly create collection. This will hang on the failpoint. let awaitCreate = startParallelShell( function () { assert.commandWorked(db.getMongo().getCollection( 'test.coll' ).insert({a: 1})); }, primary.port);   // Wait for failpoint to hit. failPoint.wait();   // Should fail primaryColl.createIndex({a: 1});   failPoint.off();   awaitCreate(); replSet.stopSet() })();
    • Sprint:
      Execution Team 2020-01-13, Execution Team 2020-01-27, Execution Team 2020-02-10, Execution Team 2020-02-24, Execution Team 2020-03-09
    • Linked BF Score:
      20

      Description

      After SERVER-43859, collection creation only takes a MODE_IX lock. The registration of the Collection object in the CollectionCatalog is no longer atomic with the commit of the storage transaction, which was previously protected by a MODE_X lock.

      We use a pre-commit hook to register collections in the CollectionCatalog before committing the WriteUnitOfWork. The collection only becomes visible in the durable catalog once the WUOW commits.

      There is now a window of time where the collection can be registered in the CollectionCatalog, but not visible to any storage snapshots, even those reading without a timestamp (so minVisibleSnapshot does not help). This causes certain debug invariants to fail when confirming that the in-memory IndexCatalog is consistent with the DurableCatalog. See here for example.

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              maria.vankeulen Maria van Keulen
              Reporter:
              louis.williams Louis Williams
              Participants:
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: