Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-64433

A new topology time could be gossiped without being majority committed



    • Bug
    • Status: Closed
    • Major - P3
    • Resolution: Fixed
    • None
    • 6.0.0-rc6, 5.0.10, 6.1.0-rc0
    • Sharding
    • None
    • Fully Compatible
    • ALL
    • v6.0, v5.0
    • Sharding EMEA 2022-03-21, Sharding EMEA 2022-04-04, Sharding EMEA 2022-04-18, Sharding EMEA 2022-05-02, Sharding EMEA 2022-05-16
    • 23


      Every time a shard is added or removed, we create a new topologyTime (let's call it T0 time) that is inserted in config.shards.  Afterwards, when this operation is locally committed (let's say that at Tcommit time), we store the value of T0 time in a in-memory data structure.  Finally, when the majority commit point is advanced to a TmajorityPoint time greater or equal than T0 time, we tick the configTime and advance the vector clock topologyTime to the T0 time.

      The problem of this approach is that we are advancing the topologyTime of the vector clock when TmajorityPoint >= T0, but this doesn't guarantee that the time associated to the oplog entry (i.e. Tcommit) was majority committed. Thus, we might end up gossiping a new topologyTime but when we a shard goes to the config server expecting to find an entry in config.shards with a topologyTime of T0, it might happen that it doesn't find it.

      Note that the topologyTime is a time but it doesn't provide any guarantee about what you will find in config.shards. It could be seen just as a counter that it is ticked every time we perform an add/remove shard operation.


        Issue Links



              sergi.mateo-bellido@mongodb.com Sergi Mateo Bellido
              sergi.mateo-bellido@mongodb.com Sergi Mateo Bellido
              0 Vote for this issue
              7 Start watching this issue