[SERVER-39755] Race in interrupted_batch_insert.js Created: 22/Feb/19 Updated: 27/Oct/23 Resolved: 22/Feb/19 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | 3.6.5 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | A. Jesse Jiryu Davis | Assignee: | A. Jesse Jiryu Davis |
| Resolution: | Gone away | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||
| Operating System: | ALL | ||||
| Participants: | |||||
| Linked BF Score: | 7 | ||||
| Description |
|
In this test, we start a batch insert on a background thread, and block the batch with a failpoint, then use mongobridge to partition off the primary temporarily. We expect the primary to notice the partition and step down, causing the insert to fail with (originally) a network error when the primary closed connections during stepdown, or (post- Meanwhile we wait for a new node to be elected, then partition it off so it steps down again, and finally unpartition the old primary and wait for it to be primary again. It is at this point we join the background thread, which should have gotten the expected error by now. There's a race condition however: We don't wait to make sure the original primary ever steps down. We could partition it off, wait for a new primary to be elected while the old one is still primary (split brain), then unpartition the old primary quickly enough that it never steps down at all. The insert thread fails because it doesn't get the error it expects. A "waitForState" that ensures the original primary steps down should fix the rare failure. |
| Comments |
| Comment by A. Jesse Jiryu Davis [ 22/Feb/19 ] |
|
I think my original diagnosis was wrong, and in fact this was fixed in |
| Comment by A. Jesse Jiryu Davis [ 22/Feb/19 ] |
|
Test introduced in
|