The vectorClock may get corrupted during addShard if the added shard has a more advanced timestamps

XMLWordPrintableJSON

    • Type: Bug
    • Resolution: Fixed
    • Priority: Major - P3
    • 8.2.0-rc0
    • Affects Version/s: None
    • Component/s: None
    • None
    • Catalog and Routing
    • Fully Compatible
    • ALL
    • CAR Team 2025-06-09
    • 0
    • None
    • 3
    • TBD
    • 🟩 Routing and Topology
    • None
    • None
    • None
    • None
    • None
    • None
    • 0

      During an addShard operation, the config server exchanges requests with the shard to add. Therefore, both nodes will gossip the vector clock from the other node.

      This means that, if the added shard has a more advanced vector clock, the sharded cluster's vector clock will get corrupted, causing unavailability on the entire cluster since any read on the config server will get blocked waiting for the oplog to reach the gossiped configTime.

       

      We've figured out two scenarios in which this could happen:

      A) An isolated replica set is instantiated as a config server and is added to a sharded cluster.

      B) A shard is attempted to be added to a sharded cluster, but the addShard operation fails. Then, the same shard is added to another sharded cluster.

            Assignee:
            Wolfee Farkas
            Reporter:
            Silvia Surroca
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: