Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-63742

Default topology time in shard can lead to infinite refresh in shard registry

    XMLWordPrintable

Details

    • Bug
    • Status: Closed
    • Critical - P2
    • Resolution: Fixed
    • None
    • 6.0.0-rc0, 5.0.7, 5.3.0-rc3
    • Sharding
    • None
    • Fully Compatible
    • ALL
    • v5.3, v5.0
    • Sharding EMEA 2022-02-21, Sharding EMEA 2022-03-07

    Description

      If a recently started shard has to write into config.vectorClock (for example, when becoming a coordinator of a 2PC transaction) it will try to insert the value Timestamp(0, 0) into the collection. However, this value gets replaced by 'now' before being inserted, and this vector clock value can be gossiped back to the routers, making the read trough cache of the ShardRegistry to advance the time in store to said gossiped value. If the topologyTime stored in the config server (in config.shards) is less than the new time in store, the ShardRegistry will stall all operations when trying to get a shard, because it will always try to refresh the cache, but it will not be able to find a time higher to the one already stored.

      This stall in the ShardRegistry can cause any operation from mongos which must contact any shard to stall.

      Attachments

        Activity

          People

            antonio.fuschetto@mongodb.com Antonio Fuschetto
            marcos.grillo@mongodb.com Marcos José Grillo Ramirez
            Votes:
            0 Vote for this issue
            Watchers:
            20 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: