Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Fixed
Priority: Major - P3
Fix Version/s: 4.7.0
Affects Version/s: None
Component/s: Replication
Labels:
None

Backwards Compatibility:
Fully Compatible
Sprint:
Repl 2020-09-07
Linked BF Score:
14
Confidence Status:
None
Work Order:
3

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

catchup_takeover_one_high_priority.js will run into the following scenario when running on a slow machine:

3 node replset (node0 and node1 are default priority and node2 has priority 2)
wait for node2 to be primary and isolate it (so it can't do a priority takeover)
step up node0
stop repl on node1 and write something on node0 so it's ahead of node1
step up node1 (which is lagged), it'll transition to primary but can't accept writes

here's where things get weird

the test expects node0 to do a catchup takeover because it's ahead
node0 wins it's dry run election and runs for a real election
node0 increments the term, so node1 steps down
due to slow machine issues, node0 does not send out a vote request within the election timeout
node1 steps up again because of the default election timeout
At this point the test's assert.soon fails bc node0 isn't primary like we expect
node0 eventually does another catchup takeover
succeeds this time, but it's too late because the test failed

Since this test just needs node0 to eventually become primary, we should increase the waitForState timeout here. I would suggest 10 minutes instead of 1 minute. If this call times out after 10 minutes, it would be more indicative of a hang instead of a slow machine issue.

We should consider doing the same change for catchup_takeover_two_nodes_ahead.js, which is also brittle when dealing with slow machines.

Assignee:: Pavithra Vetriselvan
Reporter:: Pavithra Vetriselvan
Participants:: Githook User, Pavithra Vetriselvan
Votes:: 0 Vote for this issue
Watchers:: 2 Start watching this issue

Created:: Aug 28 2020 07:23:38 PM UTC
Updated:: Oct 29 2023 10:03:50 PM UTC
Resolved:: Sep 04 2020 01:13:20 AM UTC
Confidence Status Last Update:: 03/Sep/20 1:09 PM

Details

Description

Attachments

Activity

People

Dates