[SERVER-26494] remove unreachable else-branch in sync_source_feedback.cpp logic Created: 05/Oct/16 Updated: 13/Jul/19 Resolved: 28/Feb/17 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | None |
| Fix Version/s: | 3.5.4 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Spencer Brody (Inactive) | Assignee: | Benety Goh |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||||||
| Sprint: | Repl 2016-12-12, Repl 2017-01-23, Repl 2017-02-13, Repl 2017-03-06 | ||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||
| Description |
|
This block of code has 2 if-statements which will cover all cases, preventing the else-branch which blacklists the current sync source from ever being reached. This code is new in 3.3 |
| Comments |
| Comment by Siyuan Zhou [ 13/Jul/19 ] | |||
|
The following commit is part of this ticket but has a wrong SERVER number, so I posted it here. Author: {u'username': u'benety', u'name': u'Benety Goh', u'email': u'benety@mongodb.com'}Message: | |||
| Comment by Githook User [ 28/Feb/17 ] | |||
|
Author: {u'username': u'benety', u'name': u'Benety Goh', u'email': u'benety@mongodb.com'}Message: | |||
| Comment by Spencer Brody (Inactive) [ 23/Feb/17 ] | |||
|
Okay then, feel free to update this ticket to be about removing the dead code rather than fixing it | |||
| Comment by Benety Goh [ 23/Feb/17 ] | |||
|
I can confirm that NotMasterOrSecondary errors caused by replSetMaintenance are handled correctly in BackgroundSync and SyncSourceFeedback: On a chaining replica set setup n1 (pri) <- n2 (sec) <- n3 (sec), running { replSetMaintenance: true }on n2 will cause the oplog fetcher on n3 to fail with the following message:
BackgroundSync clears its sync source and this leads to the Reporter in SyncSourceFeedback returning with an error when it detects that the sync source is no longer valid:
Eventually, n3 selects n1 as its sync source and we go back to having a stable replica set:
| |||
| Comment by Spencer Brody (Inactive) [ 23/Feb/17 ] | |||
|
Yeah, okay, I'm mostly convinced. The only last thing I'd want to confirm is that NotMasterOrSecondary errors caused by something other than a config change (for example by a node having replSetMaintenance called on it) will also be correctly handled outside of sync_source_feedback | |||
| Comment by Benety Goh [ 22/Feb/17 ] | |||
|
It seems that the common cause for the remote replSetUpdatePosition failing with those error codes NodeNotFound,NotMasterOrSecondary and InvalidReplicaSetConfig would be a change in config. Config changes currently will cause the oplog fetcher to fail because it will detect a change in the config version contained in the oplog query metadata. Subsequently, after exiting the oplog fetcher, we will go back to the sync source resolver which has its own blacklisting logic. IMO, it'd be beneficial to centralize our blacklisting criteria in one location. | |||
| Comment by Spencer Brody (Inactive) [ 13/Feb/17 ] | |||
|
Should we just swallow network exceptions like we do InvalidSyncSource and NodeNotFound? Conversely, is there a specific set of codes we should be looking for and blacklisting based on? From a very cursory skim of the code for the replSetUpdatePosition command, it looks like it can fail with NodeNotFound, NotMasterOrSecondary, and InvalidReplicaSetConfig at least, and the latter two of those at least seem worthwhile to blacklist based on? Actually, we should probably also blacklist on NodeNotFound, the problem is we can't currently distinguish between a NodeNotFound error that we generated when trying to construct the update position command and a NodeNotFound received over the network from the sync source from it processing the update position command. | |||
| Comment by Judah Schvimer [ 08/Feb/17 ] | |||
|
I think it's probably fine to rely on BackgroundSync to do blacklisting. | |||
| Comment by Benety Goh [ 08/Feb/17 ] | |||
|
Blacklist the sync source on a failed Reporter command may interfere with the retry logic added to the OplogFetcher in spencer, judah.schvimer, thoughts? | |||
| Comment by Benety Goh [ 10/Jan/17 ] | |||
|
Those != should be ==. | |||
| Comment by Spencer Brody (Inactive) [ 07/Nov/16 ] | |||
|
Benety, can you take a look at this and see if there's a potential bug here? |