[SERVER-25658] Changing the system time forward for a replica set member causes elections Created: 17/Aug/16  Updated: 06/Dec/22  Resolved: 13/Sep/16

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: 3.2.8
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Linda Qin Assignee: Backlog - Replication Team
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Assigned Teams:
Replication
Operating System: ALL
Sprint: Repl 2016-08-29, Repl 2016-09-19
Participants:

 Description   

If the system time is changed forward (in my tests, I changed the time forward for 1 minute), it will cause an election:

  • If the node is a secondary, after the time has been changed, the node will just start the election and gets votes.

    2016-08-17T01:02:10.200-0400 D REPL     [rsBackgroundSync-0] Scheduling election timeout callback at 2016-08-17T01:02:21.615-0400
    2016-08-17T01:02:10.200-0400 D REPL     [rsBackgroundSync-0] fetcher read 0 operations from remote oplog
    2016-08-17T01:03:00.001-0400 D REPL     [SyncSourceFeedback] Sending slave oplog progress to upstream updater: { replSetUpdatePosition: 1, optimes: [ { durableOpTime: { ts: Timestamp 1471410135000|2, t: 1 }, appliedOpTime: { ts: Timestamp 1471410135000|2, t: 1 }, memberId: 0, cfgver: 5 }, { durableOpTime: { ts: Timestamp 1471410073000|2, t: 0 }, appliedOpTime: { ts: Timestamp 1471410073000|2, t: 0 }, memberId: 1, cfgver: 5 }, { durableOpTime: { ts: Timestamp 1471410135000|2, t: 1 }, appliedOpTime: { ts: Timestamp 1471410135000|2, t: 1 }, memberId: 2, cfgver: 5 } ] }
    2016-08-17T01:03:00.001-0400 D REPL     [ReplicationExecutor] Canceling election timeout callback at 2016-08-17T01:02:21.615-0400
    2016-08-17T01:03:00.001-0400 D REPL     [ReplicationExecutor] Scheduling election timeout callback at 2016-08-17T01:03:11.347-0400
    2016-08-17T01:03:00.001-0400 I REPL     [ReplicationExecutor] Starting an election, since we've seen no PRIMARY in the past 10000ms
    2016-08-17T01:03:00.001-0400 I REPL     [ReplicationExecutor] conducting a dry run election to see if we could be elected
    2016-08-17T01:03:00.003-0400 D REPL     [ReplicationExecutor] VoteRequester: Got yes vote from <host1:port>, resp:{ term: 1, voteGranted: true, reason: "", ok: 1.0 }
    2016-08-17T01:03:00.003-0400 I REPL     [ReplicationExecutor] dry election run succeeded, running for election
    ...
    2016-08-17T01:03:00.009-0400 D REPL     [ReplicationExecutor] VoteRequester: Got yes vote from <host2:port>, resp:{ term: 2, voteGranted: true, reason: "", ok: 1.0 }
    2016-08-17T01:03:00.009-0400 I REPL     [ReplicationExecutor] election succeeded, assuming primary role in term 2
    2016-08-17T01:03:00.009-0400 I REPL     [ReplicationExecutor] transition to PRIMARY
    

  • If the node is a primary, after the system time has been changed, the node will step down itself. 10 seconds later, the other member in the replica set will start the election.

    2016-08-17T01:01:05.540-0400 D REPL     [ReplicationExecutor] slaveinfo lastupdate is: 2016-08-17T01:01:05.540-0400
    2016-08-17T01:01:05.540-0400 D REPL     [ReplicationExecutor] slaveinfo lastupdate is: 2016-08-17T01:01:05.540-0400
    2016-08-17T01:01:05.540-0400 D REPL     [ReplicationExecutor] earliest member 0 date: 2016-08-17T01:01:05.540-0400
    2016-08-17T01:01:05.540-0400 D REPL     [ReplicationExecutor] scheduling next check at 2016-08-17T01:01:15.540-0400
    2016-08-17T01:02:00.000-0400 I REPL     [ReplicationExecutor] can't see a majority of the set, relinquishing primary
    2016-08-17T01:02:00.000-0400 I REPL     [ReplicationExecutor] Stepping down from primary in response to heartbeat
    2016-08-17T01:02:00.000-0400 D REPL     [ReplicationExecutor] earliest member -1 date: Date(9223372036854775807)
    2016-08-17T01:02:00.000-0400 I REPL     [replExecDBWorker-1] transition to SECONDARY
    2016-08-17T01:02:11.792-0400 I REPL     [ReplicationExecutor] Member <host1:port> is now in state PRIMARY
    

I've also tested protocol version 0. Changing the system time forward doesn't cause elections.



 Comments   
Comment by Eric Milkie [ 13/Sep/16 ]

A stable clock source is required for proper replication behavior. If the clock drifts significantly differently from the perception of the passage of real time, spurious elections (or slow failover time) may occur.

Comment by Linda Qin [ 17/Aug/16 ]

By changing the system time forward, I mean:

  • If the time for the current system is T, changes it to T+X.

The new time doesn't need to be ahead of the other replica set members.

Generated at Thu Feb 08 04:09:49 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.