-
Type: Bug
-
Resolution: Fixed
-
Priority: Major - P3
-
Affects Version/s: 5.2.0, 5.0.5, 5.1.1
-
Component/s: Sharding
-
Fully Compatible
-
ALL
-
v5.3, v5.0
-
-
Sharding EMEA 2022-02-21, Sharding EMEA 2022-03-07
-
5
The current implementation of the vector clock is resilient to step-downs and rolling restarts of the config server because at least one of its nodes is alive, keeping the correct values of each component that are then gossiped out.
In case the whole config server goes down or simply gets restarted in a non-rolling fashion, the vector clock is reinitialized on the new CSRS primary with:
- Cluster time = time of the last committed operation on the oplog.
- Config time = 0 (for a very brief moment, as one majority committed write will tick it to the cluster time).
- Topology time = 0 (as long as a shard is not successfully added/removed).
There are high probabilities that causal consistency would be broken in such scenario because:
- Cluster/config times may go back in the past in case the system time is incorrect.
- The topology time may be incorrect for a long time.