[SERVER-20016] If after dry-run election, a primary is known, do not continue election proceedings Created: 18/Aug/15  Updated: 22/Feb/18  Resolved: 18/Sep/15

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: Eric Milkie Assignee: Matt Dannenberg
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: JPEG File Screen Shot 2018-02-21 at 6.11.13 PM.jpg     JPEG File Screen Shot 2018-02-21 at 6.12.16 PM.jpg    
Sprint: RPL 9 (09/18/15)
Participants:

 Description   

Currently, this can happen:

2015-08-18T16:09:01.437-0400 I REPL     [ReplicationExecutor] conducting a dry run election to see if we could be elected
2015-08-18T16:09:01.438-0400 I REPL     [ReplicationExecutor] Member lazarus:27017 is now in state PRIMARY
2015-08-18T16:09:01.439-0400 I REPL     [ReplicationExecutor] dry election run succeeded, running for election

The current effect is that the other primary will be immediately usurped by this member's election attempt.



 Comments   
Comment by Kelsey Schubert [ 22/Feb/18 ]

Hi neav16@student.bth.se,

Thanks for your report. Please note that the SERVER project is for reporting bugs or feature suggestions for the MongoDB server. For MongoDB-related support discussion please post on the mongodb-user group or Stack Overflow with the mongodb tag. A question like this involving more discussion would be best posted on the mongodb-users group.

Kind regards,
Kelsey

Comment by Neeraj Reddy Avutu [ 22/Feb/18 ]

As I'm failing to form a cluster

Comment by Neeraj Reddy Avutu [ 22/Feb/18 ]
  1. This issue repeats for me, where mongos cannot reach the replica set. Can you suggest me where the error lies as I'd enabled the firewall blocks too. Please do check the screenshots.
Comment by Eric Milkie [ 18/Sep/15 ]

I think it's all working now.

Comment by Matt Dannenberg [ 19/Aug/15 ]

I believe this to be a result of not having upstream liveness implemented yet. If the upstream liveness were working properly, the nodes would see the new primary as well as the new term, which would prevent them from running. If once the upstream liveness stuff is implemented, it is too slow, we can re-enable declareElectionWinner to solve this problem.

Comment by Eric Milkie [ 18/Aug/15 ]

27019:

2015-08-18T16:08:58.250-0400 I NETWORK  [initandlisten] connection accepted from 127.0.0.1:33647 #7 (4 connections now open)
2015-08-18T16:08:58.257-0400 I REPL     [ReplicationExecutor] stepping down from primary, because a new term has begun
2015-08-18T16:08:58.258-0400 I REPL     [replExecDBWorker-2] transition to SECONDARY
2015-08-18T16:08:58.258-0400 I NETWORK  [conn4] end connection 127.0.0.1:33633 (3 connections now open)
2015-08-18T16:08:59.644-0400 I COMMAND  [conn3] command local.oplog.rs command: getMore { getMore: 13649632922, collection: "oplog.rs", maxTimeMS: 2000 } ntoreturn:1 ntoskip:0 keyUpdates:0 writeConflicts:0 numYields:1 reslen:86 locks:{ Global: { acquireCount: { r: 6 } }, Database: { acquireCount: { r: 3 } }, oplog: { acquireCount: { r: 3 } } } protocol:op_command 2000ms
2015-08-18T16:08:59.645-0400 I NETWORK  [conn3] SocketException handling request, closing client connection: 9001 socket exception [SEND_ERROR] server [127.0.0.1:33632] 
2015-08-18T16:09:01.437-0400 I REPL     [ReplicationExecutor] conducting a dry run election to see if we could be elected
2015-08-18T16:09:01.438-0400 I REPL     [ReplicationExecutor] Member lazarus:27017 is now in state PRIMARY
2015-08-18T16:09:01.439-0400 I REPL     [ReplicationExecutor] dry election run succeeded, running for election
2015-08-18T16:09:01.442-0400 I REPL     [ReplicationExecutor] VoteRequester: Got no vote from lazarus:27017 because: candidate's data is staler than mine
2015-08-18T16:09:01.443-0400 I REPL     [ReplicationExecutor] election succeeded, assuming primary role in term 8
2015-08-18T16:09:01.443-0400 I REPL     [ReplicationExecutor] transition to PRIMARY
2015-08-18T16:09:02.438-0400 I REPL     [rsSync] transition to primary complete; database writes are now permitted

27017:

2015-08-18T16:08:58.249-0400 I REPL     [ReplicationExecutor] This node is lazarus:27017 in the config
2015-08-18T16:08:58.249-0400 I REPL     [ReplicationExecutor] transition to STARTUP2
2015-08-18T16:08:58.249-0400 I REPL     [ReplicationExecutor] Starting replication applier threads
2015-08-18T16:08:58.250-0400 I REPL     [ReplicationExecutor] transition to RECOVERING
2015-08-18T16:08:58.252-0400 I REPL     [ReplicationExecutor] transition to SECONDARY
2015-08-18T16:08:58.253-0400 I REPL     [ReplicationExecutor] Member lazarus:27018 is now in state SECONDARY
2015-08-18T16:08:58.253-0400 I REPL     [ReplicationExecutor] conducting a dry run election to see if we could be elected
2015-08-18T16:08:58.253-0400 I REPL     [ReplicationExecutor] Member lazarus:27019 is now in state PRIMARY
2015-08-18T16:08:58.255-0400 I REPL     [ReplicationExecutor] dry election run succeeded, running for election
2015-08-18T16:08:58.259-0400 I REPL     [ReplicationExecutor] election succeeded, assuming primary role in term 7
2015-08-18T16:08:58.259-0400 I REPL     [ReplicationExecutor] transition to PRIMARY
2015-08-18T16:08:59.254-0400 I REPL     [rsSync] transition to primary complete; database writes are now permitted
2015-08-18T16:09:01.435-0400 I NETWORK  [initandlisten] connection accepted from 127.0.0.1:41095 #1 (1 connection now open)
2015-08-18T16:09:01.437-0400 I NETWORK  [initandlisten] connection accepted from 127.0.0.1:41096 #2 (2 connections now open)
2015-08-18T16:09:01.441-0400 I REPL     [ReplicationExecutor] stepping down from primary, because a new term has begun
2015-08-18T16:09:01.442-0400 I REPL     [replExecDBWorker-2] transition to SECONDARY
2015-08-18T16:09:01.443-0400 I NETWORK  [conn2] end connection 127.0.0.1:41096 (1 connection now open)
2015-08-18T16:09:07.586-0400 I NETWORK  [initandlisten] connection accepted from 127.0.0.1:41097 #3 (2 connections now open)
2015-08-18T16:09:16.259-0400 I REPL     [ReplicationExecutor] waiting for 2 pings from other members before syncing
2015-08-18T16:09:18.260-0400 I REPL     [ReplicationExecutor] syncing from: lazarus:27018
2015-08-18T16:09:18.266-0400 I REPL     [rsBackgroundSync] starting rollback: OplogStartMissing our last op time fetched: (term: 7, timestamp: Aug 18 16:08:59:1). source's GTE: (term: 8, timestamp: Aug 18 16:09:02:1)

Generated at Thu Feb 08 03:52:53 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.