[SERVER-27360] Replicaset primary flapping when using high priority on primary and network split happens Created: 09/Dec/16  Updated: 27/Oct/23  Resolved: 12/Dec/16

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: 3.2.11
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Christian Amor Kvalheim Assignee: Eric Milkie
Resolution: Works as Designed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

OSX 10.11.4


Attachments: Text File 31000.log     Text File 31000.log     Text File 31001.log     Text File 31001.log     Text File 31002.log     Text File 31002.log    
Operating System: ALL
Steps To Reproduce:

Toxiproxy repoduction

  • Install toxiproxy on your machine (brew or whatever package manager you have)
  • Open a terminal terminal and run

    toxiproxy-server &
    toxiproxy-cli create mongod_primary -l localhost:12345 -u localhost:31000
    

  • Create a replicaset with one primary and 2 secondaries with ports 31000, 31001, 31002 where the primary has priority 100.

    {
        "_id" : "rs",
        "version" : 1,
        "members" : [
            {
                "_id" : 0,
                "host" : "localhost:12345",
                "priority" : 100
            },
            {
                "_id" : 1,
                "host" : "localhost:31001",
                "priority" : 1
            },
            {
                "_id" : 2,
                "host" : "localhost:31002",
                "priority" : 1
            }
        ]
    }
    

  • Wait for election to finish and a primary to be selected.
  • Now throw the primary in the black hole.

    toxiproxy-cli toxic add mongod_primary -t timeout -a timeout=0 --upstream
    toxiproxy-cli toxic add mongod_primary -t timeout -a timeout=0 --downstream
    

  • Watch as the new primary is elected and then the flapping starts with new elections happening ever X seconds.

You can turn off the toxics by running the following commands

toxiproxy-cli toxic remove mongod_primary -n timeout_downstream
toxiproxy-cli toxic remove mongod_primary -n timeout_upstream

Sprint: Storage 2017-01-23
Participants:

 Description   

Replicaset with 1 primary, 2 secondaries where primary has priority 100 and a network split happens causes election flapping.



 Comments   
Comment by Eric Milkie [ 12/Dec/16 ]

The proxy wasn't configured to isolate the primary from both incoming and outgoing connections.

Comment by Christian Amor Kvalheim [ 09/Dec/16 ]

Confirmed the proxy does not block outgoing connections and in fact cannot be used to test this scenario. So you can go ahead and close this ticket.

Comment by Eric Milkie [ 09/Dec/16 ]

It looks like, for this test, incoming connections to the primary are indeed blocked, but outgoing connections are still allowed. The flapping behavior is unfortunate but expected, for such a network configuration.

Comment by Christian Amor Kvalheim [ 09/Dec/16 ]

Server logs during flapping

Comment by Christian Amor Kvalheim [ 09/Dec/16 ]

Primary running on 31000 proxied to 12345. Replicaset is configured with the primary being on 12345. The proxy is used to be able to simulate the split.

Comment by Eric Milkie [ 09/Dec/16 ]

Also, I'm a little fuzzy on how this works. The primary node is listening on port 12345, or port 31000? In the replica set config, it says 12345.

Comment by Eric Milkie [ 09/Dec/16 ]

Yes logs for each server would be great! Thanks.

Comment by Christian Amor Kvalheim [ 09/Dec/16 ]

want logs for each server ?

Flapping is happening between 31001 and 31002 and is continuous. I'll re-run and add logs.

Comment by Eric Milkie [ 09/Dec/16 ]

Election flapping between which nodes?
Can you attach the logs from your reproducer as well, if available?

Generated at Thu Feb 08 04:14:56 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.