Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-91331

RSM can mark the wrong replica set as ReplicaSetNoPrimary on InterruptedDueToReplStateChange error

    • Service Arch
    • Minor Change
    • ALL
    • v8.0, v7.0, v6.0
    • Networking & Obs 2024-06-10, Networking & Obs 2024-06-24
    • 200

      The following can happen:
      1. mongos sends a command to a config server or shard s0
      2. As part of processing the command, s0 will run a subcommand against remote shard s1
      3. s1 steps down
      4. the command returns InterruptedDueToReplStateChange upstream in the path of s1 -> s0 -> mongos
      5. the mongos gets InterruptedDueToReplStateChange from s0 and think it's the one that failed over.
      6. mongos RSM marks s0 as ReplicaSetWithNoPrimary
      7. since InterruptedDueToReplStateChange is a retriable error, the mongos will resend the command. The mongos will try to send a hello command to get the updated view of the topology, but sees there's already an outstanding request.
      8. The command will be unable to retry until the outstanding hello on s0 returns, which will be up to 10s (the timeout of a streamable hello command).

            Assignee:
            amirsaman.memaripour@mongodb.com Amirsaman Memaripour
            Reporter:
            jason.chan@mongodb.com Jason Chan
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

              Created:
              Updated:
              Resolved: