[SERVER-27839] Allow for step downs during reconfig in ReplSetTest initiate Created: 27/Jan/17 Updated: 02/Jul/20 Resolved: 27/Feb/17 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | None |
| Fix Version/s: | 3.4.3, 3.5.4 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Judah Schvimer | Assignee: | Judah Schvimer |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||||||
| Backport Requested: |
v3.4
|
||||||||||||||||||||||||
| Sprint: | Repl 2017-03-06 | ||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||
| Linked BF Score: | 0 | ||||||||||||||||||||||||
| Description |
|
This occurs because replSetReconfig can sometimes cause the primary to step down and the reconfig in ReplSetTest initiate does not account for this. The reconfig helper is a good example of how to handle reconfigs safely. The one concern here is that post- |
| Comments |
| Comment by Githook User [ 06/Mar/17 ] | |||||||||||||||||||||
|
Author: {u'username': u'judahschvimer', u'name': u'Judah Schvimer', u'email': u'judah@mongodb.com'}Message: (cherry picked from commit a4e1443629b733c7c0fd44dddcd78e884da848bd) | |||||||||||||||||||||
| Comment by Githook User [ 27/Feb/17 ] | |||||||||||||||||||||
|
Author: {u'username': u'judahschvimer', u'name': u'Judah Schvimer', u'email': u'judah@mongodb.com'}Message: | |||||||||||||||||||||
| Comment by Judah Schvimer [ 01/Feb/17 ] | |||||||||||||||||||||
|
This only happens on PV0 because this block exists in PV0 to check for a majority on every heartbeat. There is no real reason to change this behavior, so we should just make the testing changes described in this ticket. | |||||||||||||||||||||
| Comment by Judah Schvimer [ 01/Feb/17 ] | |||||||||||||||||||||
|
One concern is that the above failure has very little time between receiving the reconfig and stepping down. It seems worth looking into if that behavior can and should be improved. | |||||||||||||||||||||
| Comment by Benety Goh [ 01/Feb/17 ] | |||||||||||||||||||||
|
The new ReplSetTest.initiate() procedure increases the config from 1 to N members in a single command. If the primary does not hear from the new members soon enough, it will step down.
| |||||||||||||||||||||
| Comment by Judah Schvimer [ 01/Feb/17 ] | |||||||||||||||||||||
|
This also looks to mostly be an issue for PV0, so it's possible that there's a timing bug there and we're not rescheduling some liveness checks correctly in PV0. | |||||||||||||||||||||
| Comment by Judah Schvimer [ 01/Feb/17 ] | |||||||||||||||||||||
|
Reconfig will actually fail if the new config makes the current primary unelectable. Reconfig can lead to a step down if nodes are added with poor timing and suddenly the primary can't see a majority of the set when it checks. | |||||||||||||||||||||
| Comment by Spencer Brody (Inactive) [ 30/Jan/17 ] | |||||||||||||||||||||
|
My understanding was reconfig only causes a stepdown if new config makes the current primary unelectable. For the reconfig done during intiation per |