resolve possible deadlock between stepdown and index build

    • Type: Bug
    • Resolution: Unresolved
    • Priority: Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • None
    • Replication
    • ALL
    • 200
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Why doesn't the stepdown finish?

      The stepdown gets trapped in the while (!_topCoord->tryToStartStepDown(...)) retry loop at step_up_step_down.cpp:233. tryToStartStepDown requires that enough secondaries have caught up to n1's last applied optime.

      Here's the twist: n0 and n2 can't fully replicate because they're both blocked waiting for the index build commit quorum (which needs n1's vote). And n1 can't cast that vote because BackgroundSync is stuck waiting for the stepdown's RSTG. This is a circular dependency:

      {{stepdown holds RSTG
      → BackgroundSync stuck, can't apply startIndexBuild
      → index build hangs waiting for n1 vote
      → n0/n2 oplog application stalls on index build
      → secondaries not caught up to n1
      → tryToStartStepDown() fails
      → while loop: releases RSTL, rstg = boost::none (line 245)
      → BackgroundSync briefly unblocks, but immediately...
      → rstg re-emplaced at line 293 (new kill for RSTL reacquire)
      → BackgroundSync blocked again}}

            Assignee:
            Unassigned
            Reporter:
            Indy Prentice
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated: