-
Type: Bug
-
Resolution: Unresolved
-
Priority: Major - P3
-
None
-
Affects Version/s: 4.4.0
-
Component/s: None
-
Service Arch
-
ALL
-
Repl 2020-10-05
Mongos can return the wrong topologyVersion in state change error responses to the client. Instead of returning its own topologyVersion, mongos can return the topologyVersion of a shard member in some error responses.
I've attach a reproduction script using failCommand (reproMongosWrongTopologyVersion.js). It's also possible to trigger the same bug using a real shutdown or replSetStepDown event. To reproduce:
- Start a cluster. Mine has 1 mongos and a 1 member shard.
- Note the topologyVersion reported by the mongos.
- Insert into test.test
- Run an operation on mongos, my example uses findAndModify but I imagine other commands will also reproduce.
- At the same time, cause the mongos-mongod operation to fail. This can be done with failCommand, shutdown:1, etc...
- See the operation fails and mongos returns a error with the incorrect topologyVersion.
The attach repro script's output (notice the different topologyVersion fields both reported by mongos):
mongos topologyVersion: { "processId" : ObjectId("5f501dd34f1464b2cf98116a"), "counter" : NumberLong(0) } ... uncaught exception: Error: findAndModifyFailed failed: { "topologyVersion" : { "processId" : ObjectId("5f501dcf33e67a0de9b4ab21"), "counter" : NumberLong(8) }, "ok" : 0, "errmsg" : "Failing command due to 'failCommand' failpoint", "code" : 11602, "codeName" : "InterruptedDueToReplStateChange", "operationTime" : Timestamp(1599086037, 28), "$clusterTime" : { "clusterTime" : Timestamp(1599086037, 28), "signature" : { "hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="), "keyId" : NumberLong(0) } } } : _getErrorWithCode@src/mongo/shell/utils.js:25:13 DBCollection.prototype.findAndModify@src/mongo/shell/collection.js:730:15 DBCollection.prototype.findOneAndReplace@src/mongo/shell/crud_api.js:833:12 @reproMongosWrongTopologyVersion.js:27:11 failed to load: reproMongosWrongTopologyVersion.js exiting with code -3
It's possible this could be fixed by SERVER-50549 but I wanted to call out this bug separately.
- is related to
-
SERVER-50549 Transform connection-related error codes in proxied commands
- Closed