Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-62907

Vector clock components must survive CSRS non-rolling restart

    • Fully Compatible
    • ALL
    • v5.3, v5.0
    • Hide
      mlaunch init --dir data --sharded 2 --replicaset 1 --csrs 1 --nodes 1 --verbose --port 20000
      # grep topologyTime in logs and check that is greater than 0
      mlaunch stop
      mlaunch start
      # grep topologyTime in logs and check that is equal to 0
      Show
      mlaunch init --dir data --sharded 2 --replicaset 1 --csrs 1 --nodes 1 --verbose --port 20000 # grep topologyTime in logs and check that is greater than 0 mlaunch stop mlaunch start # grep topologyTime in logs and check that is equal to 0
    • Sharding EMEA 2022-02-21, Sharding EMEA 2022-03-07
    • 5

      The current implementation of the vector clock is resilient to step-downs and rolling restarts of the config server because at least one of its nodes is alive, keeping the correct values of each component that are then gossiped out.

      In case the whole config server goes down or simply gets restarted in a non-rolling fashion, the vector clock is reinitialized on the new CSRS primary with:

      • Cluster time = time of the last committed operation on the oplog.
      • Config time = 0 (for a very brief moment, as one majority committed write will tick it to the cluster time).
      • Topology time = 0 (as long as a shard is not successfully added/removed).

      There are high probabilities that causal consistency would be broken in such scenario because:

      • Cluster/config times may go back in the past in case the system time is incorrect.
      • The topology time may be incorrect for a long time.

            Assignee:
            antonio.fuschetto@mongodb.com Antonio Fuschetto
            Reporter:
            pierlauro.sciarelli@mongodb.com Pierlauro Sciarelli
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

              Created:
              Updated:
              Resolved: