[SERVER-32775] Investigate afterClusterTime error in causally consistent suite with 0 vote secondaries Created: 18/Jan/18  Updated: 22/Apr/18  Resolved: 20/Apr/18

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: Kevin Albertson Assignee: Misha Tyulenev
Resolution: Cannot Reproduce Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: Text File after_cluster_time_err_0_vote_secondaries.patch    
Issue Links:
Related
related to SERVER-32572 Run causally consistent resmoke suite... Closed
Sprint: Sharding 2018-04-23, Sharding 2018-05-07
Participants:

 Description   

gvim Running the causally_consistent_jscore_passthrough suite with non-voting secondaries runs into an afterClusterTime error, even though inspecting the logs it looks like the shell is sending the correct cluster times. Because of this, we cannot make secondaries non-voting in those suites.

To reproduce this apply the attached patch (courtesy of Max) and run:

python ./buildscripts/resmoke.py --suites=causally_consistent_jscore_passthrough repro.js

The repro just does an insert followed by a find with "secondary" read preference while logging the commands and received operationTime/clusterTime.

Here's the output of one run:

----
{
    "insert" : "mycoll",
    "documents" : [
        {
            "_id" : 0
        }
    ],
    "$clusterTime" : {
        "clusterTime" : Timestamp(1515253320, 13),
        "signature" : {
            "hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
            "keyId" : NumberLong(0)
        }
    }
}
----
----
After doing insert: {
    "operationTime" : Timestamp(1515253320, 31),
    "clusterTime" : Timestamp(1515253320, 31)
}
----
----
{
    "query" : {
        "find" : "mycoll",
        "limit" : 1,
        "singleBatch" : true,
        "$clusterTime" : {
            "clusterTime" : Timestamp(1515253320, 31),
            "signature" : {
                "hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
                "keyId" : NumberLong(0)
            }
        },
        "readConcern" : {
            "afterClusterTime" : Timestamp(1515253320, 31)
        }
    },
    "$readPreference" : {
        "mode" : "secondary"
    }
}
----
assert: command failed: {
    "ok" : 0,
    "errmsg" : "readConcern afterClusterTime must not be greater than clusterTime value",
    "code" : 72,
    "codeName" : "InvalidOptions",
    "$clusterTime" : {
        "clusterTime" : Timestamp(1515253320, 31),
        "signature" : {
            "hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
            "keyId" : NumberLong(0)
        }
    },
    "operationTime" : Timestamp(1515253320, 31)
} : undefined
_getErrorWithCode@src/mongo/shell/utils.js:25:13
doassert@src/mongo/shell/assert.js:16:14
assert.commandWorked@src/mongo/shell/assert.js:403:5



 Comments   
Comment by Misha Tyulenev [ 20/Apr/18 ]

Run the repro in a loop but still can not reproduce in master. Closing.

(function(original) {
    Mongo.prototype.runCommand = function traceRunCommand(dbName, commandObj, options) {
        jsTest.log(tojson(commandObj));
        return original.apply(this, arguments);
    };
})(Mongo.prototype.runCommand);
 
function logClusterTime(prefix, res) {
    jsTest.log(
        prefix + ": " +
        tojson({operationTime: res.operationTime, clusterTime: res.$clusterTime.clusterTime}));
}
 
(function main(numReps) {
    let rep = 1;
    while (rep <= numReps) {
        jsTest.log("============= Repetition: " + rep + " ==============");
        let res = assert.commandWorked(db.runCommand({insert: "mycoll", documents: [{_id: 0}]}));
        logClusterTime("After doing insert", res);
        assert.eq(1, res.n, "insert should have succeeded");
 
        res = assert.commandWorked(db.runCommand({
            query: {
                find: "mycoll",
                limit: 1,
                singleBatch: true,
            },
            $readPreference: {mode: "secondary"}
        }));
        logClusterTime("After doing find", res);
        assert.commandWorked(db.runCommand({delete: "mycoll", deletes: [{q: {}, limit: 0}]}))
        rep = rep + 1;
    }
})(1000)

Comment by Misha Tyulenev [ 13/Apr/18 ]

Thanks, cant reproduce on master. It fails intermittently with "Cannot accept sharding commands if sharding state has not been initialized with a shardIdentity document" which is similar to SERVER-29337

Comment by Max Hirschhorn [ 13/Apr/18 ]

misha.tyulenev, I believe you can get the contents of the repro.js file Kevin referenced from the attached after_cluster_time_err_0_vote_secondaries.patch file.

Comment by Misha Tyulenev [ 13/Apr/18 ]

kevin.albertson could you please attach a repro script as well?

Generated at Thu Feb 08 04:31:16 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.