Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-53026

Secondary cannot restart replication

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.0.22, 3.6.22, 4.2.12, 4.4.4
    • Component/s: Replication
    • Labels:
      None
    • Backwards Compatibility:
      Fully Compatible
    • Operating System:
      ALL
    • Backport Requested:
      v4.4, v4.2, v4.0
    • Sprint:
      Repl 2020-11-30, Repl 2020-12-14
    • Linked BF Score:
      0

      Description

      After SERVER-33747, initial_sync_document_validation.js times out on 3.6. The problem is that my SERVER-33747 change, ReplicationCoordinatorImpl::_startDataReplication exits early if it's ever been called before. The sequence _startDataReplication -> _stopDataReplication -> _startDataReplication therefore no longer actually restarts data replication.

      This shows up in initial_sync_document_validation.js because this test calls the "resync" command. The "resync" command was removed after 3.6 (SERVER-31239) leaving ReplicationCoordinatorImpl::resyncData with one caller, replSetSyncFrom, the method was completely removed after 4.4 (SERVER-46831).

      Does the replSetSyncFrom command suffer the same deadlock? In all versions?

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              jesse A. Jesse Jiryu Davis
              Reporter:
              jesse A. Jesse Jiryu Davis
              Participants:
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: