Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-46558

Bgsync stops all index builds even before transitioning to rollback state and causes secondary replication to hang

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.4.0-rc0, 4.7.0
    • Component/s: Storage
    • Labels:
      None
    • Backwards Compatibility:
      Fully Compatible
    • Operating System:
      ALL
    • Backport Requested:
      v4.4
    • Sprint:
      Execution Team 2020-03-23
    • Linked BF Score:
      24

      Description

      Since bgsync aborts the index build even before transitioning to rollback state, side effect of that is really bad, as the node is still eligible to run election and become primary. One notable consequence of that behavior is that, consider a case where we have 3 node replica set. (node A is the primary and node B secondary1 and node C is secondary2) and the thread pool size is 1.

      1) node A (primary for term 10) starts the index Build 'x_1', uses indexbuildCoordinator thread pool and generates startIndexBuild oplog entries to both secondaries.
      2) node B and node C, on receiving the startIndexBuild starts the index build (uses indexbuildCoordinator thread pool)
      3) node A faces network partition and gets disconnected from node B and node C.
      4) node A receives some writes W1 at term 10 and sees it lost majority of votes and steps down.
      5) Node C gets elected and becomes primary for term 11. And, node A now rejoins the n/w and sees the sync source, say, node C (new primary) has diverged from its oplog. So, it gets into this code path and starts aborting the index build. Since the node A hasn't yet transitioned to rollback, it's free to run the election and let's assume it won the election on receiving vote from node B.

      As a result of step 5, node A will no longer run the real rollback step. This is because, on node A becoming primary, it stops the oplog fetcher service, so this check or [this|https://github.com/mongodb/mongo/blob/17984db6c531594c00bf226804d9ab7ed6225643/src/mongo/db/repl/rollback_impl.cpp#L190 check might fails making the node not to rollback any oplog entries.

      Problems:
      1) The consequence of this is that index build on secondaries becomes orphaned.
      2) Since the index build on node A got aborted, the node A is free to start new index build, say, 'y_1'. If secondaries receives the startIndexBuild oplog entry for index 'y_1', the secondaries would wait for the indexBuildsCoordinator thread to become available and blocks secondary replication.

      Solution: We should abort index build only when the node transitioned its state to rollback and we are sure that the entries are going to get rolled back. And, it applies to both rollback via recoverToStableTimestamp and rollback via refetch.

      P.S: I noticed this failure frequently in my patch build. And, currently, since the index build is generating high volumes of timeout error. The BF stating this issue is lost.

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              louis.williams Louis Williams
              Reporter:
              suganthi.mani Suganthi Mani
              Participants:
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: