Catchup takeover cannot be scheduled if primary has caught up but stuck before being writeable primary

XMLWordPrintableJSON

    • Type: Improvement
    • Resolution: Won't Fix
    • Priority: Major - P3
    • None
    • Affects Version/s: None
    • Component/s: Replication
    • None
    • Replication
    • None
    • 3
    • None
    • None
    • None
    • None
    • None
    • None

      Secondaries schedule catchup take over when they think that the primary's last applied opTime is behind itself. However if the primary is already caught up (either it caught up due to the normal catchup mechanism on stepup or it is already newer by the time it is elected), no secondary can schedule catchup takeover, therefore if the primary is stuck before it becomes a writeable primary (e.g. stuck in bumping config term), the whole systems freezes because no catchup takeover is scheduled. So it seems that we should relax the criteria of scheduling catchup takeover by depending on whether primary has written a new entry in the new term (which indicates that is has become writable), without requiring that primary is behind.

            Assignee:
            [DO NOT USE] Backlog - Replication Team
            Reporter:
            Wenbin Zhu
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: