Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-47029

Fix race when streamable RSM updates the shard registry after topology change

    • Type: Icon: Bug Bug
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • 4.4.0-rc0
    • Affects Version/s: None
    • Component/s: Sharding
    • Labels:
      None
    • Fully Compatible
    • ALL
    • v4.4
    • Sharding 2020-03-23, Sharding 2020-04-06
    • 21

      From BF description:
      There is a race when updating the ShardRegistry/config.shards on mongos. Mongos gets an isMaster response from two different nodes (node0, node1) in the same replica set after node0 was removed. Node0 sends its response first and includes itself in the response with type 'ghost' and node1 does not include this node in the response at all. Mongos updates the topology description with the response from node0 and then does the same with the second response from node1, then calls onConfirmedSet on the ReplicaSetChangeNotifier. This causes the shard registry to update its info for this repl set and write to config.shards, but the second notifier event (triggered by the response from node1) reaches the shard registry first and then the event triggered by node0. This means we first write that the replica set only has two nodes, and then overwrite it and include the removed node.

      A possible fix is to not include nodes that are not primaries/secondaries in the connection string passed to the shard registry.

            Assignee:
            lamont.nelson@mongodb.com Lamont Nelson
            Reporter:
            janna.golden@mongodb.com Janna Golden
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: