ShardRegistry can be unable to refresh when all hosts of a shard have been changed

XMLWordPrintableJSON

    • Type: Bug
    • Resolution: Unresolved
    • Priority: Major - P3
    • None
    • Affects Version/s: 8.0.12, 8.2.0
    • Component/s: None
    • None
    • Catalog and Routing
    • ALL
    • 3
    • 🟩 Routing and Topology
    • None
    • None
    • None
    • None
    • None
    • None

      When all hosts in a shard are changed, a node's ShardRegistry might be unable to learn the new connection string and thus the node won't be able to communicate to the shard.

      This can only happen when the complete replica set reconfig happens during a period of time where the node is unable to communicate to any nodes of the replica.

      Details:
      The ShardRegistry keeps track of the shards in the cluster and their connection strings. Since SERVER-91121, the ShardRegistry only refreshes from the configsvr when it detects that the topologyTime has changed. When a replicaSet is reconfigured, the shard will update the `hosts` attribute of the corresponding `config.shards` entry (see SERVER-21185). However, this does not advance the `topologyTime`, and so ShardRegistries don't learn through their periodic refreshes. However, typically ShardRegistries learn about the reconfig through a different mechanism: The ReplicaSetMonitor notifies it when it learns about the reconfig from a replica it already knew.
      However, if the node is unable to communicate to any known replica, and then they are all replaced, the node won't be able to learn about the new nodes until it restarts.

            Assignee:
            Unassigned
            Reporter:
            Jordi Serra Torrens
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated: