[SERVER-31783] Secondaries may generate a UUID for a replicated collection on their own while upgrading to fCV=3.6 Created: 01/Nov/17  Updated: 30/Oct/23  Resolved: 08/Nov/17

Status: Closed
Project: Core Server
Component/s: Replication, Storage
Affects Version/s: 3.6.0-rc0
Fix Version/s: 3.6.0-rc4

Type: Bug Priority: Critical - P2
Reporter: Max Hirschhorn Assignee: Geert Bosch
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Problem/Incident
is caused by SERVER-31209 need to store upgrade/downgrade in pr... Closed
Related
is related to SERVER-31018 Secondaries generate a UUID for a rep... Closed
is related to SERVER-30745 Prohibit unsafe comparisons against f... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Steps To Reproduce:

Apply the following patch that introduces a failpoint causing the call to repl::logOp() to get delayed (e.g. until after the update to the admin.system.version collection is written to the oplog) and run the following JavaScript test.

python buildscripts/resmoke.py --suites=no_server repro_server31783.js

repro_server31783.js

(function() {
    "use strict";
 
    const rst = new ReplSetTest({nodes: 2});
    rst.startSet();
 
    // Rig the election so that the first node is always primary and that modifying the
    // featureCompatibilityVersion document doesn't need to wait for data to replicate.
    var replSetConfig = rst.getReplSetConfig();
    for (let i = 1; i < rst.nodes.length; ++i) {
        replSetConfig.members[i].priority = 0;
        replSetConfig.members[i].votes = 0;
    }
    rst.initiate(replSetConfig);
 
    const primary = rst.getPrimary();
    const primaryDB = primary.getDB("test");
 
    assert.commandWorked(primaryDB.adminCommand({setFeatureCompatibilityVersion: "3.4"}));
 
    assert.commandWorked(primaryDB.adminCommand(
        {configureFailPoint: "hangBeforeLoggingCreateCollection", mode: "alwaysOn"}));
 
    let awaitCreateCollection;
    let awaitUpgradeFCV;
 
    try {
        awaitCreateCollection = startParallelShell(function() {
            assert.commandWorked(db.runCommand({create: "mycoll"}));
        }, primary.port);
 
        assert.soon(function() {
            return rawMongoProgramOutput().match("createCollection: test.mycoll");
        });
 
        awaitUpgradeFCV = startParallelShell(function() {
            assert.commandWorked(db.adminCommand({setFeatureCompatibilityVersion: "3.6"}));
        }, primary.port);
 
        {
            let res;
            assert.soon(
                function() {
                    res = assert.commandWorked(
                        primaryDB.adminCommand({getParameter: 1, featureCompatibilityVersion: 1}));
                    return res.featureCompatibilityVersion.version === "3.4" &&
                        res.featureCompatibilityVersion.targetVersion === "3.6";
                },
                function() {
                    return "targetVersion of featureCompatibilityVersion document wasn't updated" +
                        " on primary: " + tojson(res);
                });
        }
    } finally {
        assert.commandWorked(primaryDB.adminCommand(
            {configureFailPoint: "hangBeforeLoggingCreateCollection", mode: "off"}));
    }
 
    awaitCreateCollection();
    awaitUpgradeFCV();
 
    rst.checkReplicatedDataHashes();
    rst.stopSet();
})();

diff --git a/src/mongo/db/catalog/database_impl.cpp b/src/mongo/db/catalog/database_impl.cpp
index 0706f5a..52a521b 100644
--- a/src/mongo/db/catalog/database_impl.cpp
+++ b/src/mongo/db/catalog/database_impl.cpp
@@ -74,6 +74,7 @@
 #include "mongo/platform/random.h"
 #include "mongo/stdx/memory.h"
 #include "mongo/util/assert_util.h"
+#include "mongo/util/fail_point_service.h"
 #include "mongo/util/log.h"
 
 namespace mongo {
@@ -87,6 +88,7 @@ MONGO_INITIALIZER(InitializeDatabaseFactory)(InitializerContext* const) {
     });
     return Status::OK();
 }
+MONGO_FP_DECLARE(hangBeforeLoggingCreateCollection);
 }  // namespace
 
 using std::unique_ptr;
@@ -819,6 +821,8 @@ Collection* DatabaseImpl::createCollection(OperationContext* opCtx,
         }
     }
 
+    MONGO_FAIL_POINT_PAUSE_WHILE_SET(hangBeforeLoggingCreateCollection);
+
     opCtx->getServiceContext()->getOpObserver()->onCreateCollection(
         opCtx, collection, nss, optionsWithUUID, fullIdIndexSpec);

Sprint: Storage 2017-11-13
Participants:
Linked BF Score: 0

 Description   

Prior to the changes from e035926 as part of SERVER-31209, serverGlobalParams.featureCompatibility.isSchemaVersion36 was immediately being to set true when {setFeatureCompatibilityVersion: "3.6"} ran as part of the upgrade process, and serverGlobalParams.featureCompatibility.version was being set to FeatureCompatibility::Version::k36 only after assigning UUIDs to all existing collections. Whether we're in schemaVersion=3.6 is instead now derived from the (featureCompatibilityVersion, targetVersion) pair; however, the okayCreation variable incorrectly permits a secondary to generate its own UUIDs (i.e. cause data to diverge from the primary) when the featureCompatibilityVersion is FeatureCompatibility::Version::kUpgradingTo36 since that's a case where schemaVersion=3.6 but we not yet fully-upgraded to 3.6.

Any time a new collection is being created under schemaVersion=3.6—regardless of whether we've fully-upgraded to 3.6 or not—the collection should be assigned a UUID by the primary and that UUID should be included in the oplog entry for the "create" command. Since (1) DatabaseImpl::createCollection() and _updateDatabaseUUIDSchemaVersion() both require that the database lock in held in MODE_X and (2) FeatureCompatibilityVersion::setTargetUpgrade() is called before updateUUIDSchemaVersion(), we're guaranteed that either (a) creating a collection must observe the change to schemaVersion=3.6 by happening after the UUID upgrade, or (b) creating a collection happens before the UUID upgrade is performed.

CollectionOptions optionsWithUUID = options;
if (enableCollectionUUIDs && !optionsWithUUID.uuid &&
    serverGlobalParams.featureCompatibility.isSchemaVersion36()) {
    auto coordinator = repl::ReplicationCoordinator::get(opCtx);
    bool okayCreation =
        (coordinator->getReplicationMode() != repl::ReplicationCoordinator::modeReplSet ||
          (serverGlobalParams.featureCompatibility.getVersion() !=
          ServerGlobalParams::FeatureCompatibility::Version::kFullyUpgradedTo36) ||
          coordinator->canAcceptWritesForDatabase(opCtx, nss.db()) ||
          nss.isSystemDotProfile());  // system.profile is special as it's not replicated
    if (!okayCreation) {
        std::string msg = str::stream() << "Attempt to assign UUID to replicated collection: "
                                        << nss.ns();
        severe() << msg;
        uasserted(ErrorCodes::InvalidOptions, msg);
    }
    optionsWithUUID.uuid.emplace(CollectionUUID::gen());
}

https://github.com/mongodb/mongo/blob/0342b7bd64be6a8fec25a18ab633f2f9a27f0558/src/mongo/db/catalog/database_impl.cpp#L769-L786



 Comments   
Comment by Githook User [ 08/Nov/17 ]

Author:

{'name': 'Geert Bosch', 'username': 'GeertBosch', 'email': 'geert@mongodb.com'}

Message: SERVER-31783 Prevent secondaries from generating UUIDs during upgrade
Branch: master
https://github.com/mongodb/mongo/commit/45da08b12555bf4082bd05e0e5cf3f74e56de3d5

Comment by Max Hirschhorn [ 01/Nov/17 ]

As an attempt to describe the issue in a slightly different way: The following sequence of oplog entries are a valid history of operations performed by the primary because it is possible (1) for the write to the oplog that sets targetVersion=3.6 to commit before the write to the oplog that creates a collection, and for (2) creating the collection to observe serverGlobalParams.featureCompatibility in either targetVersion unset or targetVersion=3.6.

{  "ts" : Timestamp(1509509049, 154),  "t" : NumberLong(1),  "h" : NumberLong("-121227912217949421"),  "v" : 2,  "op" : "u",  "ns" : "admin.system.version",  "o2" : {  "_id" : "featureCompatibilityVersion" },  "o" : {  "_id" : "featureCompatibilityVersion",  "version" : "3.4",  "targetVersion" : "3.6" } }
{  "ts" : Timestamp(1509509049, 155),  "t" : NumberLong(1),  "h" : NumberLong("2059760424251381818"),  "v" : 2,  "op" : "c",  "ns" : "test.$cmd",  "o" : {  "create" : "readMajority",  "idIndex" : {  "v" : 2,  "key" : {  "_id" : 1 },  "name" : "_id_",  "ns" : "test.readMajority" } } }
{  "ts" : Timestamp(1509509049, 168),  "t" : NumberLong(1),  "h" : NumberLong("4964440004185536663"),  "v" : 2,  "op" : "c",  "ns" : "test.$cmd",  "ui" : UUID("4100b5da-8303-4109-b8de-ee48fd5f8202"),  "o2" : {  "collectionOptions_old" : {   } },  "o" : {  "collMod" : "readMajority" } }

The problem comes from how the secondary treats serverGlobalParams.featureCompatibility.getVersion() == kUpgradingTo36 as a case where it should generate its own UUID (like the primary would have in this circumstance). I suspect the appropriate change is to remove the serverGlobalParams.featureCompatibility.getVersion() != ServerGlobalParams::FeatureCompatibility::Version::kFullyUpgradedTo36 condition since serverGlobalParams.featureCompatibility.isSchemaVersion36() already covers both cases where UUIDs are permitted and/or expected.

Generated at Thu Feb 08 04:28:10 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.