[SERVER-64408] VectorClock's topology time may be wrongly advanced in case of rollback Created: 10/Mar/22  Updated: 12/May/22  Resolved: 12/May/22

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: Pierlauro Sciarelli Assignee: Sergi Mateo Bellido
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
depends on SERVER-64433 A new topology time could be gossiped... Closed
Related
related to SERVER-64433 A new topology time could be gossiped... Closed
is related to SERVER-64931 Reenable ReadThroughCache correctness... Closed
Backport Requested:
v6.0
Sprint: Sharding EMEA 2022-04-04, Sharding EMEA 2022-04-18, Sharding EMEA 2022-05-02, Sharding EMEA 2022-05-16
Participants:

 Description   

The CSRS is registering a topology time tick point (timestamp X)when observing local inserts and applyOps modifying config.shards. On majority commit, if the timestamp of the majority committed oplog entry is greater or equal than X, the topology time is then advanced to X.

The current implementation does not take into account rollbacks and may end up bumping wrongly the topology time. Example:

  • A shard entry is added to config.shards at time T
  • A tick point is registered with timestamp T
  • Rollback happens, oplog gets truncated to the operation registered at T-2
  • A new CSRS primary steps up, the shard is added at time T-1 and then any write happens at time T
  • On majority commit, the old primary ticks the topology time to T instead of T-1


 Comments   
Comment by Sergi Mateo Bellido [ 12/May/22 ]

SERVER-64433 fixes this problem and a few other more, closing this ticket.

Comment by Pierlauro Sciarelli [ 10/Mar/22 ]

Considering that add/remove shard are not happening so often, a possible solution would be to not locally commit addShard and removeShard operations. Instead, we could synchronously wait for majority and then advance the topology time with the newly committed timestamp.

This solution works also in case an improbale shutdown happens exactly after majority committing the entry and before advancing the topology time, because on step-up the new primary CSRS node will recover the correct topology time from disk.

It would be great to throw away the unnecessarily complex ticking semantics.

CC sergi.mateo-bellido

Generated at Thu Feb 08 06:00:16 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.