Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-9934

Slow failovers after step down due to sleeping rsBackgroundSync

    • Type: Icon: Bug Bug
    • Resolution: Done
    • Priority: Icon: Major - P3 Major - P3
    • 2.4.6, 2.5.1
    • Affects Version/s: 2.4.4
    • Component/s: Replication
    • None
    • Environment:
      Tested on Linux amd64, Ubuntu (10.04 Lucid and 13.04 Raring)
    • Fully Compatible
    • ALL
    • Hide

      Run this test script with -vv: https://gist.github.com/nelhage/5777797

      Observe the failovers are taking 10-12s, even though the cluster is healthy, unloaded, and running across the local interface.

      Look at the logs for a node that has become primary. Observe that after the election completes, the server logs "replSet waiting for replication to finish before becoming primary", and does not actually step up for up to 10 seconds later.

      Show
      Run this test script with -vv: https://gist.github.com/nelhage/5777797 Observe the failovers are taking 10-12s, even though the cluster is healthy, unloaded, and running across the local interface. Look at the logs for a node that has become primary. Observe that after the election completes, the server logs "replSet waiting for replication to finish before becoming primary", and does not actually step up for up to 10 seconds later.

      With a healthy replset, failovers via rs.stepDown() take 10+s,
      even when the election completes in 1-2s.

      After an rs.steDown(), a healthy moderately-loaded replset on a
      good network will normally complete an election for an new
      primary normally within 1-2s. However, before the new primary can
      step up, it has to stop the background sync thread.

      Suppose (as is usually/often the case) that the new primary was
      syncing from the stepped-down node. When that node stepped down,
      it closed all connections, including the connection the new
      primary was using to read the oplog. This triggered a DBException
      inside of _producerThread, triggering this code:

      sethbmsg(str::stream() << "db exception in producer: " << e.toString());
      sleepsecs(10);

      When the new primary goes to stop the rsBackgroundSync thread,
      the thread is caught in the middle of that sleep, and so we end
      up literally waiting on a sleep(10) before the cluster has a new
      primary.

            Assignee:
            milkie@mongodb.com Eric Milkie
            Reporter:
            nelhage Nelson Elhage
            Votes:
            1 Vote for this issue
            Watchers:
            10 Start watching this issue

              Created:
              Updated:
              Resolved: