[SERVER-15655] Secondary can't join if w:0 is in getLastErrorDefaults Created: 14/Oct/14 Updated: 11/Jul/16 Resolved: 31/Oct/14 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | None |
| Fix Version/s: | 2.8.0-rc0 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | A. Jesse Jiryu Davis | Assignee: | Spencer Brody (Inactive) |
| Resolution: | Done | Votes: | 0 |
| Labels: | 28qa | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Backwards Compatibility: | Minor Change | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Operating System: | ALL | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Steps To Reproduce: | Start a 2.6.4 single-member RS:
Now start a member running the latest code from master (I have 4bea2a90) with the option "-vvvv". Try to add it from the shell connected to the 2.6.4 primary:
The primary thinks the member was added. However:
The member we tried to add logs "User Assertion: 17505:replSet illegal config: getLastErrorDefaults w:0":
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Participants: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Description |
|
Two bugs: 1. If you try to add a member running 2.8 to a replica set configured with {getLastErrorDefaults: {w:0}}, it stays in state UNKNOWN. It should be REMOVED. 2. The member doesn't log the problem at the default log level. We shouldn't have to increase verbosity to see what the problem is. |
| Comments |
| Comment by Githook User [ 03/Nov/14 ] | |||||||||||||||||||||||||||||||||||||||||||
|
Author: {u'username': u'stbrody', u'name': u'Spencer T Brody', u'email': u'spencer@mongodb.com'}Message: | |||||||||||||||||||||||||||||||||||||||||||
| Comment by Githook User [ 31/Oct/14 ] | |||||||||||||||||||||||||||||||||||||||||||
|
Author: {u'username': u'stbrody', u'name': u'Spencer T Brody', u'email': u'spencer@mongodb.com'}Message: | |||||||||||||||||||||||||||||||||||||||||||
| Comment by Spencer Brody (Inactive) [ 29/Oct/14 ] | |||||||||||||||||||||||||||||||||||||||||||
|
Also, after shutting down that node and re-starting it I see
Seems like we shouldn't persist an invalid configuration to local.system.replset, as doing so will just cause us to crash on restart and make recovery more difficult | |||||||||||||||||||||||||||||||||||||||||||
| Comment by Spencer Brody (Inactive) [ 29/Oct/14 ] | |||||||||||||||||||||||||||||||||||||||||||
|
When I do this on master now I see:
and rs.status() reports REMOVED for the secondary. milkie, your previous comment said we wanted this to cause the secondary to be UNKNOWN not REMOVED, so is the current behavior a bug? | |||||||||||||||||||||||||||||||||||||||||||
| Comment by Eric Milkie [ 29/Oct/14 ] | |||||||||||||||||||||||||||||||||||||||||||
|
We should put the config version of every node in replSetGetStatus, at least. And we can log when we reject configs. | |||||||||||||||||||||||||||||||||||||||||||
| Comment by A. Jesse Jiryu Davis [ 28/Oct/14 ] | |||||||||||||||||||||||||||||||||||||||||||
|
Now it's worse. If I go through the same steps as before (2.6.4 on port 27017, replSetInitiate with w:0 in getLastErrorDefaults, try to add a secondary running the current master code, hash 4f4a5103), the secondary reports it can't find a member to sync from:
At the default log level, the secondary logs:
If I turn it up to "-vvvvv", it logs:
The secondary never logs the real reason it can't sync: it won't accept the config. If I reconfig on the primary to replace the w:0 with a w:1, the new secondary immediately starts syncing. | |||||||||||||||||||||||||||||||||||||||||||
| Comment by Eric Milkie [ 17/Oct/14 ] | |||||||||||||||||||||||||||||||||||||||||||
|
New code has been released; I expect that bug #2 above (log issue) is now resolved. | |||||||||||||||||||||||||||||||||||||||||||
| Comment by Spencer Brody (Inactive) [ 14/Oct/14 ] | |||||||||||||||||||||||||||||||||||||||||||
|
Moving into 2.7 Required to ensure that we at least look at this again before release, as this is a 2.6/2.8 compatibility problem. | |||||||||||||||||||||||||||||||||||||||||||
| Comment by Eric Milkie [ 14/Oct/14 ] | |||||||||||||||||||||||||||||||||||||||||||
|
As the replication codepaths of 2.7.8 are about to change, we should put this on hold and retest after the switchover. |