Race condition: step down takes place during automatic reconfig.

XMLWordPrintableJSON

    • Type: Bug
    • Resolution: Fixed
    • Priority: Major - P3
    • 8.2.0-rc0
    • Affects Version/s: None
    • Component/s: None
    • Replication
    • Fully Compatible
    • ALL
    • Repl 2025-07-07, Repl 2025-07-21
    • 200
    • None
    • 3
    • TBD
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      The situation causing the invariant failure is as follows: 
      we have a node that is elected as a new primary in the following term(2).
      The old primary steps down, but step down is being very slow because of environment issues(probably storage), and the term is not updated yet. As a part of topology change the old primary receives a new config with a higher term 2 from the stepping up node and installs the new config(via heartbeat). Then it schedules an automatic reconfig to remove "newlyAdded" fields since it considers itself primary.  It crashes the invariant that asserts that topology coordinator term matches primary's config term(since topology coordinator's term is still old). To avoid the invariant crash we need a better check of the state, so in addition of the node being primary we should require that there is no an ongoing stepdown.

       

              Assignee:
              Solomon Lifshits
              Reporter:
              Solomon Lifshits
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

                Created:
                Updated:
                Resolved: