Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Gone away
Priority: Major - P3
Fix Version/s: None
Affects Version/s: 3.6.5
Component/s: Replication
Labels:
None

Operating System:
ALL
Linked BF Score:
7
Confidence Status:
None
Work Order:
3

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name:
None
Goal Link:
None

In this test, we start a batch insert on a background thread, and block the batch with a failpoint, then use mongobridge to partition off the primary temporarily. We expect the primary to notice the partition and step down, causing the insert to fail with (originally) a network error when the primary closed connections during stepdown, or (post-~~SERVER-38516~~) an InterruptedDueToStepdown error.

Meanwhile we wait for a new node to be elected, then partition it off so it steps down again, and finally unpartition the old primary and wait for it to be primary again.

It is at this point we join the background thread, which should have gotten the expected error by now.

There's a race condition however: We don't wait to make sure the original primary ever steps down. We could partition it off, wait for a new primary to be elected while the old one is still primary (split brain), then unpartition the old primary quickly enough that it never steps down at all. The insert thread fails because it doesn't get the error it expects.

A "waitForState" that ensures the original primary steps down should fix the rare failure.

Assignee:: A. Jesse Jiryu Davis

Reporter:: A. Jesse Jiryu Davis

Participants:: A. Jesse Jiryu Davis

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Created:: Feb 22 2019 12:46:52 PM UTC

Updated:: Oct 27 2023 08:42:57 PM UTC

Resolved:: Feb 22 2019 05:30:45 PM UTC

Confidence Status Last Update:: 22/Feb/19 12:47 PM

Details

Description

Attachments

Activity

People

Dates