[SERVER-17400] ReplSet primary's state sometimes get stuck following reconfig Created: 26/Feb/15 Updated: 05/Jan/18 Resolved: 26/Feb/15 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | 2.6.7 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Mathias Stearn | Assignee: | Andy Schwerin |
| Resolution: | Won't Fix | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Operating System: | ALL | ||||||||
| Participants: | |||||||||
| Case: | (copied to CRM) | ||||||||
| Description |
|
In some scenarios, a reconfig operation can cause the primary to get "stuck" in an unelectable state. The node recovers following a restart. The characteristic log lines look like the following:
The hostnames in the final two lines are the same. |
| Comments |
| Comment by Davis Ford [ 09/Dec/16 ] |
|
Andy, I'm currently trying to add a Mongo 3.0 node to an older production 2.6.5 replica set. I'm seeing a lot of these errors as the new 3.0 node spins up and syncs off one of the current secondaries: [rsMgr] replSet error p != rs->self in checkNewState The error is being spewed from the secondary that is being read from for the new 3.0 node (which is also a secondary) What does this mean and what should I do, if anything? If it is benign, that would be great to hear, but this is a prod system that I'm trying to upgrade and it makes me pretty nervous – googling the errors leads either here or directly to the source. Any words of comfort? |
| Comment by Andy Schwerin [ 26/Feb/15 ] |
|
This bug does not exist on the 3.0 and master branches, due to extensive refactoring. The code in the 2.6 branch is too racy to make a fix practical. Since this can only happen in a race during reconfig, the workaround of restarting the stuck node will suffice. |