[SERVER-64433] A new topology time could be gossiped without being majority committed Created: 11/Mar/22  Updated: 29/Oct/23  Resolved: 13/May/22

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: 6.0.0-rc6, 5.0.10, 6.1.0-rc0

Type: Bug Priority: Major - P3
Reporter: Sergi Mateo Bellido Assignee: Sergi Mateo Bellido
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
is depended on by SERVER-64408 VectorClock's topology time may be wr... Closed
Related
related to SERVER-64627 Need general method to handle in-memo... Closed
is related to SERVER-64408 VectorClock's topology time may be wr... Closed
is related to SERVER-64931 Reenable ReadThroughCache correctness... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v6.0, v5.0
Sprint: Sharding EMEA 2022-03-21, Sharding EMEA 2022-04-04, Sharding EMEA 2022-04-18, Sharding EMEA 2022-05-02, Sharding EMEA 2022-05-16
Participants:
Linked BF Score: 23

 Description   

Every time a shard is added or removed, we create a new topologyTime (let's call it T0 time) that is inserted in config.shards.  Afterwards, when this operation is locally committed (let's say that at Tcommit time), we store the value of T0 time in a in-memory data structure.  Finally, when the majority commit point is advanced to a TmajorityPoint time greater or equal than T0 time, we tick the configTime and advance the vector clock topologyTime to the T0 time.

The problem of this approach is that we are advancing the topologyTime of the vector clock when TmajorityPoint >= T0, but this doesn't guarantee that the time associated to the oplog entry (i.e. Tcommit) was majority committed. Thus, we might end up gossiping a new topologyTime but when we a shard goes to the config server expecting to find an entry in config.shards with a topologyTime of T0, it might happen that it doesn't find it.

Note that the topologyTime is a time but it doesn't provide any guarantee about what you will find in config.shards. It could be seen just as a counter that it is ticked every time we perform an add/remove shard operation.



 Comments   
Comment by Githook User [ 13/Jun/22 ]

Author:

{'name': 'Sergi Mateo Bellido', 'email': 'sergi.mateo-bellido@mongodb.com', 'username': 'smateo'}

Message: SERVER-64433 Recovering the topology tick points on startup/init sync

Adding a way to execute unit tests with the WiredTiger SE

(cherry picked from commit cc33146088335da2bc08edf4eeec7d6b9fd724f0)
Branch: v5.0
https://github.com/mongodb/mongo/commit/63a8db3b942ab424565516ce841dbf0e35d4b46a

Comment by Githook User [ 13/May/22 ]

Author:

{'name': 'Sergi Mateo Bellido', 'email': 'sergi.mateo-bellido@mongodb.com', 'username': 'smateo'}

Message: SERVER-64433 Recovering the topology tick points on startup/init sync

(cherry picked from commit cc33146088335da2bc08edf4eeec7d6b9fd724f0)
Branch: v6.0
https://github.com/mongodb/mongo/commit/f9bf6f179994afeaff255568a6ab0c28dc3f646c

Comment by Githook User [ 12/May/22 ]

Author:

{'name': 'Sergi Mateo Bellido', 'email': 'sergi.mateo-bellido@mongodb.com', 'username': 'smateo'}

Message: SERVER-64433 Recovering the topology tick points on startup/init sync
Branch: master
https://github.com/mongodb/mongo/commit/cc33146088335da2bc08edf4eeec7d6b9fd724f0

Generated at Thu Feb 08 06:00:19 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.