Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-21744

Clients may fail to discover new primaries when clock skew between nodes is greater than electionTimeout

    Details

    • Backwards Compatibility:
      Minor Change
    • Operating System:
      ALL
    • Backport Completed:
    • Sprint:
      Repl D (12/11/15), Repl E (01/08/16), Repl F (01/29/16)

      Description

      Assume there exist two nodes in a set (A and B) and that node A's clock is X seconds ahead of node B's clock. If node A is elected and then node B is elected within X seconds of node A being elected, node B's electionId will be less than node A's electionId, since it happened "earlier."

        Issue Links

          Activity

          Hide
          milkie Eric Milkie added a comment -

          Jeff, I believe you are correct; that's indeed a flaw.
          If the new primary waited until its first logged entry was committed before setting and broadcasting the electionId, that would solve this. It would increase failover time significantly, but only for clients that were not doing w:majority writes. For w:majority writes, the failover time would be increased by a small amount (the time it takes to do the write on the primary, without waiting for replication).

          Show
          milkie Eric Milkie added a comment - Jeff, I believe you are correct; that's indeed a flaw. If the new primary waited until its first logged entry was committed before setting and broadcasting the electionId, that would solve this. It would increase failover time significantly, but only for clients that were not doing w:majority writes. For w:majority writes, the failover time would be increased by a small amount (the time it takes to do the write on the primary, without waiting for replication).
          Hide
          matt.dannenberg Matt Dannenberg (Inactive) added a comment -

          David Golden A potential flaw with your proposed solution:

          • Node A becomes PRIMARY with setVersion 2, which has protocolVersion 1 and receives an 0xFFFF electionId.
          • Reconfig changes setVersion to 3 and protocolVersion to 0.
          • Driver comes online and sees the 0xFFFF electionId with setVersion 3.
          • Node B is elected with setVersion 3 and new electionId based on the time.
          • Driver never acknowledges node B as primary.
          Show
          matt.dannenberg Matt Dannenberg (Inactive) added a comment - David Golden A potential flaw with your proposed solution: Node A becomes PRIMARY with setVersion 2, which has protocolVersion 1 and receives an 0xFFFF electionId. Reconfig changes setVersion to 3 and protocolVersion to 0. Driver comes online and sees the 0xFFFF electionId with setVersion 3. Node B is elected with setVersion 3 and new electionId based on the time. Driver never acknowledges node B as primary.
          Hide
          milkie Eric Milkie added a comment -

          A refinement to my idea: we can update the electionId twice, to avoid the increase in failover time. Immediately after being elected, a node can set the electionId time to be the time of the last committed op it currently has. Then, when it succeeds in committing its first op written, it can update the electionId time again.

          Show
          milkie Eric Milkie added a comment - A refinement to my idea: we can update the electionId twice, to avoid the increase in failover time. Immediately after being elected, a node can set the electionId time to be the time of the last committed op it currently has. Then, when it succeeds in committing its first op written, it can update the electionId time again.
          Hide
          xgen-internal-githook Githook User added a comment -

          Author:

          {u'username': u'visualzhou', u'name': u'Siyuan Zhou', u'email': u'siyuan.zhou@mongodb.com'}

          Message: SERVER-21744 ElectionID always increases under PV0 and PV1.

          Reset election id on PV upgrade and downgrade.
          Branch: master
          https://github.com/mongodb/mongo/commit/1c28e37982441275cc127853985b30f2c6e74ff5

          Show
          xgen-internal-githook Githook User added a comment - Author: {u'username': u'visualzhou', u'name': u'Siyuan Zhou', u'email': u'siyuan.zhou@mongodb.com'} Message: SERVER-21744 ElectionID always increases under PV0 and PV1. Reset election id on PV upgrade and downgrade. Branch: master https://github.com/mongodb/mongo/commit/1c28e37982441275cc127853985b30f2c6e74ff5
          Hide
          xgen-internal-githook Githook User added a comment -

          Author:

          {u'username': u'visualzhou', u'name': u'Siyuan Zhou', u'email': u'siyuan.zhou@mongodb.com'}

          Message: SERVER-21744 ElectionID always increases under PV0 and PV1.

          Reset election id on PV upgrade and downgrade.

          (cherry picked from commit 1c28e37982441275cc127853985b30f2c6e74ff5)
          Branch: v3.2
          https://github.com/mongodb/mongo/commit/21a507148d36d9adabcf105ac87a34f4f2007821

          Show
          xgen-internal-githook Githook User added a comment - Author: {u'username': u'visualzhou', u'name': u'Siyuan Zhou', u'email': u'siyuan.zhou@mongodb.com'} Message: SERVER-21744 ElectionID always increases under PV0 and PV1. Reset election id on PV upgrade and downgrade. (cherry picked from commit 1c28e37982441275cc127853985b30f2c6e74ff5) Branch: v3.2 https://github.com/mongodb/mongo/commit/21a507148d36d9adabcf105ac87a34f4f2007821

            People

            • Votes:
              0 Vote for this issue
              Watchers:
              19 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                  Agile