[SERVER-8205] able to create a split primary situation with asym network delays Created: 17/Jan/13  Updated: 06/Dec/22  Resolved: 20/Apr/16

Status: Closed
Project: Core Server
Component/s: Networking, Replication
Affects Version/s: 2.3.2
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Greg Studer Assignee: Backlog - Replication Team
Resolution: Done Votes: 0
Labels: elections
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: Text File currentTest.txt     Text File currentTest_dual_primary.txt     File sync_change_source.js    
Issue Links:
Duplicate
is duplicated by SERVER-8145 Two primaries for the same replica set Closed
Related
Assigned Teams:
Replication
Backwards Compatibility: Fully Compatible
Operating System: ALL
Participants:

 Description   

Found this while doing related other testing - the setup (I think) is:

1) Start a replica set with 3 (regular) nodes
2) Introduce a 40s (asymmetrical) network delay between one secondary and the primary, causing the secondary to timeout when trying to reach the primary. The primary can still see the secondary.
3) Start a number of inserts into the primary
4) Eventually it seems the delayed secondary attempts to elect itself primary and succeeds because the other secondary votes for it. This does not trigger a stepdown of the original primary.

Two logs and scripts to reproduce below. All testing was done localhost, but using minor modifications of the test framework to assign each host a different local IP.



 Comments   
Comment by Eric Milkie [ 20/Apr/16 ]

Protocol version 1 fixes this issue.

Generated at Thu Feb 08 03:16:49 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.