Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-26717

PSA flapping during netsplit when using PV1

    • Type: Icon: Bug Bug
    • Resolution: Duplicate
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 3.2.10
    • Component/s: Replication
    • Labels:
      None
    • Replication
    • ALL
    • Hide

      1. Create a 3-member PSA replset spread across 3 VMs, which will represent our 3 DCs. DC1 and DC2 contain databearing nodes and DC3 contains the arbiter.
      2. Create a netsplit between DC1 and DC2. e.g. `sudo iptables -I INPUT -s OTHER_VM_IP -j DROP` on each of DC1 and DC2
      3. View `mongod.log` on DC3 to watch primary flap between DC1 and DC2 every 10 seconds. This can also be seen via `rs.status()`.
      4. Perform writes to the replset long enough so that some writes go to DC1 and others go to DC2. (e.g. >10 seconds)
      5. Resolve netsplit. e.g. `sudo iptables -R INPUT 1` on DC1 and DC2
      6. Either DC1 or DC2 will go into a ROLLBACK state and its writes dumped to disk.

      Show
      1. Create a 3-member PSA replset spread across 3 VMs, which will represent our 3 DCs. DC1 and DC2 contain databearing nodes and DC3 contains the arbiter. 2. Create a netsplit between DC1 and DC2. e.g. `sudo iptables -I INPUT -s OTHER_VM_IP -j DROP` on each of DC1 and DC2 3. View `mongod.log` on DC3 to watch primary flap between DC1 and DC2 every 10 seconds. This can also be seen via `rs.status()`. 4. Perform writes to the replset long enough so that some writes go to DC1 and others go to DC2. (e.g. >10 seconds) 5. Resolve netsplit. e.g. `sudo iptables -R INPUT 1` on DC1 and DC2 6. Either DC1 or DC2 will go into a ROLLBACK state and its writes dumped to disk.

      Under PV1 when using a PSA (or PSSSA) replset spread across three data centres, the primary node flaps between DC1 and DC2 every 10 seconds during a netsplit between DC1 and DC2. Each data centre receives roughly half the writes (assuming roughly constant write traffic). When the netsplit is resolved, the data in the non-primary data centre is rolled back.

      When the netsplit occurs, the following sequence of events happen:
      1. Secondary in DC2 is unable to contact a primary for 10 seconds and calls a new term.
      2. The DC3 arbiter announces the new term to DC1.
      3. The DC1 primary steps down.
      4. Client connections are dropped.
      5. The node in DC2 is elected primary.
      6. Clients reconnect and find DC2 is now primary. DC2 starts accepting writes.
      7. 10 seconds later, DC1 hasn’t been able to contact a primary and the process repeats itself.

      Here is a snippet of logs from the arbiter demonstrating the flapping behaviour:
      2016-10-19T22:49:47.655+0000 I REPL [ReplicationExecutor] Member 10.0.0.102:27018 is now in state SECONDARY
      2016-10-19T22:49:47.669+0000 I REPL [ReplicationExecutor] Member 10.0.0.101:27017 is now in state PRIMARY
      2016-10-19T22:49:57.672+0000 I REPL [ReplicationExecutor] Member 10.0.0.102:27017 is now in state PRIMARY
      2016-10-19T22:50:02.672+0000 I ASIO [ReplicationExecutor] dropping unhealthy pooled connection to 10.0.0.101:27017
      2016-10-19T22:50:02.672+0000 I ASIO [ReplicationExecutor] after drop, pool was empty, going to spawn some connections
      2016-10-19T22:50:02.673+0000 I ASIO [NetworkInterfaceASIO-Replication-0] Connecting to 10.0.0.101:27017
      2016-10-19T22:50:02.674+0000 I ASIO [NetworkInterfaceASIO-Replication-0] Successfully connected to 10.0.0.101:27017
      2016-10-19T22:50:02.675+0000 I REPL [ReplicationExecutor] Member 10.0.0.101:27017 is now in state SECONDARY
      2016-10-19T22:50:12.676+0000 I ASIO [ReplicationExecutor] dropping unhealthy pooled connection to 10.0.0.102:27017
      2016-10-19T22:50:12.676+0000 I ASIO [ReplicationExecutor] after drop, pool was empty, going to spawn some connections
      2016-10-19T22:50:12.676+0000 I ASIO [NetworkInterfaceASIO-Replication-0] Connecting to 10.0.0.102:27017
      2016-10-19T22:50:12.677+0000 I ASIO [NetworkInterfaceASIO-Replication-0] Successfully connected to 10.0.0.102:27017
      2016-10-19T22:50:12.677+0000 I REPL [ReplicationExecutor] Member 10.0.0.101:27018 is now in state PRIMARY
      2016-10-19T22:50:12.678+0000 I REPL [ReplicationExecutor] Member 10.0.0.102:27017 is now in state SECONDARY
      2016-10-19T22:50:22.665+0000 I REPL [ReplicationExecutor] Member 10.0.0.102:27018 is now in state PRIMARY

      N.B. Flapping does not occur with PSS/PV1 or PSA/PV0.

            Assignee:
            backlog-server-repl [DO NOT USE] Backlog - Replication Team
            Reporter:
            james.kovacs@mongodb.com James Kovacs
            Votes:
            0 Vote for this issue
            Watchers:
            14 Start watching this issue

              Created:
              Updated:
              Resolved: