[SERVER-31017] Race condition in setFeatureCompatibilityVersion command leads to fCV=3.6 and isSchemaVersion36=false Created: 10/Sep/17  Updated: 27/Oct/23  Resolved: 09/Oct/17

Status: Closed
Project: Core Server
Component/s: Storage
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Max Hirschhorn Assignee: Esha Maharishi (Inactive)
Resolution: Gone away Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
is related to SERVER-30793 merge setFeatureCompatibilityVersion ... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Steps To Reproduce:

python buildscripts/resmoke.py --suites=no_server repro_server31017.js

repro_server31017.js

(function() {
    "use strict";
 
    load("jstests/libs/parallelTester.js");  // for ScopedThread and CountDownLatch
 
    const rst = new ReplSetTest({nodes: 1});
    rst.startSet();
    rst.initiate();
 
    function setFCV(host, version, barrier) {
        const conn = new Mongo(host);
 
        barrier.countDown();
        barrier.await();
 
        try {
            assert.commandWorked(conn.adminCommand({setFeatureCompatibilityVersion: version}));
            return {ok: 1};
        } catch (e) {
            return {ok: 0, error: e.toString(), stack: e.stack};
        }
    }
 
    const primary = rst.getPrimary();
    const db = primary.getDB("test");
 
    while (true) {
        // We create two threads: one to set the server's featureCompatibilityVersion to "3.4" and
        // another to set the server's featureCompatibilityVersion to "3.6".
        {
            const barrier = new CountDownLatch(2);
            const thread34 = new ScopedThread(setFCV, primary.host, "3.4", barrier);
            const thread36 = new ScopedThread(setFCV, primary.host, "3.6", barrier);
 
            thread34.start();
            thread36.start();
 
            thread34.join();
            thread36.join();
 
            assert.commandWorked(thread34.returnData());
            assert.commandWorked(thread36.returnData());
        }
 
        // If the thread that sets the server's featureCompatibilityVersion to "3.4" did its update
        // to the featureCompatibilityVersion document last, then we reset the server's
        // featureCompatibilityVersion to "3.6" and try again.
        {
            const res = assert.commandWorked(
                db.adminCommand({getParameter: 1, featureCompatibilityVersion: 1}));
 
            if (res.featureCompatibilityVersion === "3.4") {
                assert.commandWorked(db.adminCommand({setFeatureCompatibilityVersion: "3.6"}));
                continue;
            }
        }
 
        // Otherwise, we implicitly create a collection via an insert operation and verify that
        // collections are always created with UUIDs when the server's featureCompatibilityVersion
        // is "3.6".
        {
            db.mycoll.drop();
            assert.writeOK(db.mycoll.insert({}));
 
            const collectionInfos = db.getCollectionInfos({name: "mycoll"});
            assert.eq(1, collectionInfos.length, tojson(collectionInfos));
            assert(collectionInfos[0].info.hasOwnProperty("uuid"),
                   "Expected collection to have a UUID since featureCompatibilityVersion is 3.6: " +
                       tojson(collectionInfos));
        }
    }
 
    rst.stopSet();
})();

Sprint: Sharding 2017-10-23
Participants:

 Description   

This issue is caused by allowing serverGlobalParams.featureCompatibility.isSchemaVersion36 to be set to false after the featureCompatibilityVersion document has been updated as part of FeatureCompatibilityVersion::set(). Consider the following sequence of events with two threads when the server starts out in featureCompatibilityVersion=3.6.

Thread in {setFCV: "3.4"}                 Thread in {setFCV: "3.6"}
          |                                         |
          |                                         |
FCV::set() called                                   |
          |                                         |
fCV document changed to "3.4"                       |
          |                                         |
fCV global set to "3.4" via OpObserver              |
          |                                         |
isSchemaVersion36 global set to false               |
via OpObserver                                      |
          |                                         |
          |                                         |
          |                               FCV::set() called
          |                                         |
          |                               fCV document changed to "3.6"
          |                                         |
          |                               fCV global set to "3.6" via OpObserver
          |                                         |
          |                               isSchemaVersion36 global set to true
          |                               via OpObserver
          |                                         |
          |                                         |
isSchemaVersion36 global set to false               |
via setFCV                                          |



 Comments   
Comment by Esha Maharishi (Inactive) [ 09/Oct/17 ]

Confirmed that this issue was fixed by SERVER-31209 and continued not to repro after SERVER-30793.

Comment by Geert Bosch [ 05/Oct/17 ]

Per Max recommendation assigning to you for closing as duplicate. No need for additional regression testing, as existing tests are sufficient. Just manually run Max' test case when SERVER-30793 is fixed.

Comment by Esha Maharishi (Inactive) [ 05/Oct/17 ]

I think we can run the repro_server31017.js after the changes from SERVER-30793 go in and confirm that the issue has been resolved.

max.hirschhorn, sounds good. I'll link SERVER-30793 as related so we don't forget.

Comment by Max Hirschhorn [ 05/Oct/17 ]

Thanks esha.maharishi, I wasn't sure whether SERVER-30793 would only impact the config servers or whether it would be for all usages of the "setFeatureCompatibilityVersion" command. I think we can run the repro_server31017.js after the changes from SERVER-30793 go in and confirm that the issue has been resolved.

It also refactors the setFCV command so that isSchemaVersion36 is not set directly by the command.

I suspect it will as a result of that change.

Comment by Esha Maharishi (Inactive) [ 05/Oct/17 ]

max.hirschhorn if you agree, feel free to close this as a duplicate. Thanks louis.williams for finding this.

Comment by Esha Maharishi (Inactive) [ 05/Oct/17 ]

I believe this problem will go away once SERVER-30793 is in. For one, that patch makes setFCV take an exclusive lock, so only one instance of it can run at a time. It also refactors the setFCV command so that isSchemaVersion36 is not set directly by the command.

Generated at Thu Feb 08 04:25:44 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.