Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-38504

On Primary, to verify that our lastAppliedOpTime is lagged behind the TopologyCoordinator::_lastCommittedOpTime, we should compare both its term and timestamp for pv1.

    • Type: Icon: Bug Bug
    • Resolution: Won't Fix
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 3.6.9
    • Component/s: None
    • Labels:
      None
    • Replication
    • ALL

      Consider the below upgrade->downgrade->upgrade (pv1->pv0->pv1 ) sequence.

      1) Start a replica set in pv1.

      2) Insert some document in pv1 (for term =1)

      3)Downgrade to pv0 while the secondaries are still replicating the documents from previous pv1 (term =1)

      4) Upgrade to pv1 before the secondaries downgrade to pv0 and their lastDurableOptime are still in term 1.

      5) The current primary when it attempts to write in its current term (i.e.) term 0,  it does below 2 things

             - Moves forward its lastAppliedOpTime.

             - Sets the new lastAppliedOpTime to its stableOpTime only if the new lastAppliedOpTime <= lastCommitedOpTime.

      Currently if both the lastAppliedOpTime and lastCommitedOpTime are from pv1, we first compare its term and if they are equal, then we compare its timestamp. In the above the sequence, say our lastCommitedOpTime is  (ts1, t:1) and lastAppliedOpTime is (ts2, t:0) where ts1 < ts2. This means we would satisfy the above condition lastAppliedOpTime <= lastCommitedOpTime that results in updating our stableTimestamp. As a result, it leads to invariant failure in line 3398.

            Assignee:
            backlog-server-repl [DO NOT USE] Backlog - Replication Team
            Reporter:
            suganthi.mani@mongodb.com Suganthi Mani
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: