Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-50735

Mongos 4.4.0 can return the topologyVersion of a shard in state change errors

    • Type: Icon: Bug Bug
    • Resolution: Unresolved
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 4.4.0
    • Component/s: None
    • Service Arch
    • ALL
    • Repl 2020-10-05

      Mongos can return the wrong topologyVersion in state change error responses to the client. Instead of returning its own topologyVersion, mongos can return the topologyVersion of a shard member in some error responses.

      I've attach a reproduction script using failCommand (reproMongosWrongTopologyVersion.js). It's also possible to trigger the same bug using a real shutdown or replSetStepDown event. To reproduce:

      1. Start a cluster. Mine has 1 mongos and a 1 member shard.
      2. Note the topologyVersion reported by the mongos.
      3. Insert into test.test
      4. Run an operation on mongos, my example uses findAndModify but I imagine other commands will also reproduce.
      5. At the same time, cause the mongos-mongod operation to fail. This can be done with failCommand, shutdown:1, etc...
      6. See the operation fails and mongos returns a error with the incorrect topologyVersion.

      The attach repro script's output (notice the different topologyVersion fields both reported by mongos):

      mongos topologyVersion:  {
      	"processId" : ObjectId("5f501dd34f1464b2cf98116a"),
      	"counter" : NumberLong(0)
      uncaught exception: Error: findAndModifyFailed failed: {
      	"topologyVersion" : {
      		"processId" : ObjectId("5f501dcf33e67a0de9b4ab21"),
      		"counter" : NumberLong(8)
      	"ok" : 0,
      	"errmsg" : "Failing command due to 'failCommand' failpoint",
      	"code" : 11602,
      	"codeName" : "InterruptedDueToReplStateChange",
      	"operationTime" : Timestamp(1599086037, 28),
      	"$clusterTime" : {
      		"clusterTime" : Timestamp(1599086037, 28),
      		"signature" : {
      			"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
      			"keyId" : NumberLong(0)
      } :
      failed to load: reproMongosWrongTopologyVersion.js
      exiting with code -3

      It's possible this could be fixed by SERVER-50549 but I wanted to call out this bug separately.

            backlog-server-servicearch Backlog - Service Architecture
            shane.harvey@mongodb.com Shane Harvey
            0 Vote for this issue
            11 Start watching this issue