[SERVER-26717] PSA flapping during netsplit when using PV1 Created: 20/Oct/16  Updated: 06/Dec/22  Resolved: 18/Jan/17

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: 3.2.10
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: James Kovacs Assignee: Backlog - Replication Team
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
duplicates SERVER-27125 Arbiters in pv1 should vote no in ele... Closed
Related
related to SERVER-14539 Full consensus arbiter (i.e. uses an ... Backlog
related to SERVER-26728 Add jstest that primary doesn't flap ... Closed
related to SERVER-26725 Automatically reconfig pv1 replica se... Closed
Assigned Teams:
Replication
Operating System: ALL
Steps To Reproduce:

1. Create a 3-member PSA replset spread across 3 VMs, which will represent our 3 DCs. DC1 and DC2 contain databearing nodes and DC3 contains the arbiter.
2. Create a netsplit between DC1 and DC2. e.g. `sudo iptables -I INPUT -s OTHER_VM_IP -j DROP` on each of DC1 and DC2
3. View `mongod.log` on DC3 to watch primary flap between DC1 and DC2 every 10 seconds. This can also be seen via `rs.status()`.
4. Perform writes to the replset long enough so that some writes go to DC1 and others go to DC2. (e.g. >10 seconds)
5. Resolve netsplit. e.g. `sudo iptables -R INPUT 1` on DC1 and DC2
6. Either DC1 or DC2 will go into a ROLLBACK state and its writes dumped to disk.

Participants:
Case:

 Description   

Under PV1 when using a PSA (or PSSSA) replset spread across three data centres, the primary node flaps between DC1 and DC2 every 10 seconds during a netsplit between DC1 and DC2. Each data centre receives roughly half the writes (assuming roughly constant write traffic). When the netsplit is resolved, the data in the non-primary data centre is rolled back.

When the netsplit occurs, the following sequence of events happen:
1. Secondary in DC2 is unable to contact a primary for 10 seconds and calls a new term.
2. The DC3 arbiter announces the new term to DC1.
3. The DC1 primary steps down.
4. Client connections are dropped.
5. The node in DC2 is elected primary.
6. Clients reconnect and find DC2 is now primary. DC2 starts accepting writes.
7. 10 seconds later, DC1 hasn’t been able to contact a primary and the process repeats itself.

Here is a snippet of logs from the arbiter demonstrating the flapping behaviour:
2016-10-19T22:49:47.655+0000 I REPL [ReplicationExecutor] Member 10.0.0.102:27018 is now in state SECONDARY
2016-10-19T22:49:47.669+0000 I REPL [ReplicationExecutor] Member 10.0.0.101:27017 is now in state PRIMARY
2016-10-19T22:49:57.672+0000 I REPL [ReplicationExecutor] Member 10.0.0.102:27017 is now in state PRIMARY
2016-10-19T22:50:02.672+0000 I ASIO [ReplicationExecutor] dropping unhealthy pooled connection to 10.0.0.101:27017
2016-10-19T22:50:02.672+0000 I ASIO [ReplicationExecutor] after drop, pool was empty, going to spawn some connections
2016-10-19T22:50:02.673+0000 I ASIO [NetworkInterfaceASIO-Replication-0] Connecting to 10.0.0.101:27017
2016-10-19T22:50:02.674+0000 I ASIO [NetworkInterfaceASIO-Replication-0] Successfully connected to 10.0.0.101:27017
2016-10-19T22:50:02.675+0000 I REPL [ReplicationExecutor] Member 10.0.0.101:27017 is now in state SECONDARY
2016-10-19T22:50:12.676+0000 I ASIO [ReplicationExecutor] dropping unhealthy pooled connection to 10.0.0.102:27017
2016-10-19T22:50:12.676+0000 I ASIO [ReplicationExecutor] after drop, pool was empty, going to spawn some connections
2016-10-19T22:50:12.676+0000 I ASIO [NetworkInterfaceASIO-Replication-0] Connecting to 10.0.0.102:27017
2016-10-19T22:50:12.677+0000 I ASIO [NetworkInterfaceASIO-Replication-0] Successfully connected to 10.0.0.102:27017
2016-10-19T22:50:12.677+0000 I REPL [ReplicationExecutor] Member 10.0.0.101:27018 is now in state PRIMARY
2016-10-19T22:50:12.678+0000 I REPL [ReplicationExecutor] Member 10.0.0.102:27017 is now in state SECONDARY
2016-10-19T22:50:22.665+0000 I REPL [ReplicationExecutor] Member 10.0.0.102:27018 is now in state PRIMARY

N.B. Flapping does not occur with PSS/PV1 or PSA/PV0.



 Comments   
Comment by Spencer Brody (Inactive) [ 18/Jan/17 ]

The plan to address this is no longer to implement SERVER-14539. Instead we are going to do the smaller change of SERVER-27125, which is much less invasive and can be completed much sooner. I'm closing this ticket as a duplicate of SERVER-27125.

Comment by Spencer Brody (Inactive) [ 07/Nov/16 ]

Since we didn't wind up doing SERVER-26725, this issue still exists with the use of arbiters in pv1. SERVER-14539 remains the long term plan to improve this behavior.

Comment by Spencer Brody (Inactive) [ 21/Oct/16 ]

Assuming current primary is in DC1...

This works in PV1 with PSS because in that case the secondary in DC2 would vote no to the node in DC3 becoming primary because it would be ahead of that in terms of replication from writes that DC1 took.
This works in PV0 with PSA because in that case the arbiter in in DC2 would vote no to the node in DC3 becoming primary because it can still see a healthy primary.

This will be fixed short term by SERVER-26725 and longer term by SERVER-14539

Generated at Thu Feb 08 04:13:00 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.