Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-63742

Default topology time in shard can lead to infinite refresh in shard registry

    • Type: Icon: Bug Bug
    • Resolution: Fixed
    • Priority: Icon: Critical - P2 Critical - P2
    • 6.0.0-rc0, 5.0.7, 5.3.0-rc3
    • Affects Version/s: None
    • Component/s: Sharding
    • Labels:
    • Fully Compatible
    • ALL
    • v5.3, v5.0
    • Sharding EMEA 2022-02-21, Sharding EMEA 2022-03-07

      If a recently started shard has to write into config.vectorClock (for example, when becoming a coordinator of a 2PC transaction) it will try to insert the value Timestamp(0, 0) into the collection. However, this value gets replaced by 'now' before being inserted, and this vector clock value can be gossiped back to the routers, making the read trough cache of the ShardRegistry to advance the time in store to said gossiped value. If the topologyTime stored in the config server (in config.shards) is less than the new time in store, the ShardRegistry will stall all operations when trying to get a shard, because it will always try to refresh the cache, but it will not be able to find a time higher to the one already stored.

      This stall in the ShardRegistry can cause any operation from mongos which must contact any shard to stall.

            antonio.fuschetto@mongodb.com Antonio Fuschetto
            marcos.grillo@mongodb.com Marcos José Grillo Ramirez
            0 Vote for this issue
            20 Start watching this issue