[GODRIVER-3106] "hello" failpoint in CSOT command-execution UST is premature Created: 26/Jan/24  Updated: 29/Jan/24

Status: Backlog
Project: Go Driver
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Unknown
Reporter: Preston Vasquez Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: greenerbuild
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Assigned Teams:
Go Drivers
Documentation Changes Summary:

1. What would you like to communicate to the user about this feature?
2. Would you like the user to see examples of the syntax and/or executable code and its output?
3. Which versions of the driver/connector does this apply to?


 Description   

Detailed steps to reproduce the problem?

Running the following unified spec test in the Go Driver will eventually result in an error where the maxTimeMS on a command is 1 millisecond higher than expected:

go test -run TestUnifiedSpec/client-side-operations-timeout/command-execution.json/maxTimeMS_value_in_the_command_is_less_than_timeoutMS -v -count=100 -failfast

error:

    unified_spec_runner.go:146: events verification failed at index 0: event comparison failed at index 1: error comparing command documents: comparison error at key "maxTimeMS": error doing special matching assertion: expected numeric value 451 to be less than or equal 450; all events found for client:
 
        Started Events
 
        [localhost:27017[-74]] {"insert": "timeoutColl","ordered": true,"writeConcern": {"w": {"$numberInt":"1"}},"lsid": {"id": {"$binary":{"base64":"Gtya3oUDTuKmRdyc0bU+vQ==","subType":"04"}}},"txnNumber": {"$numberLong":"1"},"$clusterTime": {"clusterTime": {"$timestamp":{"t":1706248519,"i":6}},"signature": {"hash": {"$binary":{"base64":"AAAAAAAAAAAAAAAAAAAAAAAAAAA=","subType":"00"}},"keyId": {"$numberLong":"0"}}},"maxTimeMS": {"$numberLong":"99842"},"$db": "test","documents": [{"_id": {"$numberInt":"1"}}]}
        [localhost:27017[-74]] {"insert": "timeoutColl","ordered": true,"writeConcern": {"w": {"$numberInt":"1"}},"lsid": {"id": {"$binary":{"base64":"Gtya3oUDTuKmRdyc0bU+vQ==","subType":"04"}}},"txnNumber": {"$numberLong":"2"},"$clusterTime": {"clusterTime": {"$timestamp":{"t":1706248521,"i":1}},"signature": {"hash": {"$binary":{"base64":"AAAAAAAAAAAAAAAAAAAAAAAAAAA=","subType":"00"}},"keyId": {"$numberLong":"0"}}},"maxTimeMS": {"$numberLong":"451"},"$db": "test","documents": [{"_id": {"$numberInt":"2"}}]}
 
        Succeeded Events
 
 
        Failed Events
 
 
        Pool Events
 
 
--- FAIL: TestUnifiedSpec (2.36s)
    --- FAIL: TestUnifiedSpec/client-side-operations-timeout (2.35s)
        --- FAIL: TestUnifiedSpec/client-side-operations-timeout/command-execution.json (2.35s)
            --- FAIL: TestUnifiedSpec/client-side-operations-timeout/command-execution.json/maxTimeMS_value_in_the_command_is_less_than_timeoutMS (2.35s)
FAIL
exit status 1
FAIL    go.mongodb.org/mongo-driver/internal/integration/unified        10.369s

This occurs because the way we calculate the maxTimeMS in the Go Driver (as per the specifications) is dependent on the min RTT sample:

int64((remainingTimeout - rttMin + time.Millisecond - 1) / time.Millisecond)

And for some reason, this value (rttMin) is less than 50 despite the fact that the test is specifically designed to block for 50 milliseconds: https://github.com/mongodb/specifications/blob/master/source/client-side-operations-timeout/tests/command-execution.yml#L45

The value for the hello RTT sample is calculated here. If you slightly alter this code to the following and run the above test, you will see that occasionally the sample is less than 50:

c.helloRTT = time.Since(handshakeStartTime)
 
if c.helloRTT >= 49*time.Millisecond && c.helloRTT < 50*time.Millisecond {
    fmt.Println("rtt: ", c.helloRTT)
}

This is pretty inexplicable. It is not likely that there is a server bug. You can test this in mongosh using the following:

// Set up the failpoint configuration
db.adminCommand({
    configureFailPoint: 'failCommand',
    mode: 'alwaysOn',
    data: {
        failCommands: ['hello', 'isMaster'],
        blockConnection: true,
        blockTimeMS: 50 // Reduced to 50 milliseconds
    }
});
 
// Function to measure the time for db.runCommand({ hello: 1 })
function measureTimeForHello() {
    // Record the start time
    const startTime = new Date();
 
    // Execute db.runCommand
    db.runCommand({ hello: 1 });
 
    // Record the end time
    const endTime = new Date();
 
    // Calculate the time difference
    const timeTaken = endTime - startTime;
 
    // Return the time in milliseconds
    return timeTaken;
}
 
// Example: Iterate over the function and print only if timeResult is < 55 ms (10,000 iterations)
for (let i = 0; i < 10000; i++) {
    const timeResult = measureTimeForHello();
 
    if (timeResult < 50) {
        print(`Iteration ${i + 1}: Time taken: ${timeResult} milliseconds`);
    }
}

Furthermore you can test that the Go duration logic is sound in this playground: https://go.dev/play/p/NN3LuHPVeSR

Definition of done: what must be done to consider the task complete?

Ensure that the UST in question passes.

The exact Go version used, with patch level:

go version go1.21.4 darwin/arm64

The exact version of the Go driver used:

tested and confirmed on 5.0.23 & latest

Describe how MongoDB is set up. Local vs Hosted, version, topology, load balanced, etc.

Using mongo orchestration:

export MONGO_ORCHESTRATION_HOME=/Users/preston.vasquez/Developer/mongo-orchestration-home 
export STPATH=/Users/preston.vasquez/Developer/mongo-go-driver
 
mongo-orchestration stop --pidfile=/Users/preston.vasquez/Developer/mongo-orchestration-home/server.pid
 
${DRIVERS_TOOLS}/.evergreen/start-orchestration.sh /Users/preston.vasquez/Developer/mongo-orchestration-home
 
echo "sending security token configuraiton to the server..."
 
curl -X POST http://localhost:8889/v1/replica_sets \
	-H 'Content-Type: application/json' \
	-d @${DRIVERS_TOOLS}/.evergreen/orchestration/configs/replica_sets/basic.json

{"releases":{"default":"/Users/preston.vasquez/.local/m/versions/5.0.23/bin"}}

The operating system and version (e.g. Windows 7, OSX 10.8, ...)

macOS 14.2.1 (23C71)

Security Vulnerabilities

NA


Generated at Thu Feb 08 08:40:05 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.