[SERVER-15931] Repeated "[ReplicationExecutor] could not find member to sync from" in healthy replica set Created: 04/Nov/14 Updated: 24/Nov/14 Resolved: 13/Nov/14 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | 2.8.0-rc0 |
| Fix Version/s: | 2.8.0-rc1 |
| Type: | Bug | Priority: | Minor - P4 |
| Reporter: | Ramon Fernandez Marina | Assignee: | Matt Dannenberg |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||
| Operating System: | ALL | ||||||||
| Steps To Reproduce: | See |
||||||||
| Participants: | |||||||||
| Description |
|
While working on
These messages just add noise, but would be nice to fix them at some point. |
| Comments |
| Comment by Githook User [ 13/Nov/14 ] |
|
Author: {u'username': u'dannenberg', u'name': u'matt dannenberg', u'email': u'matt.dannenberg@10gen.com'}Message: |
| Comment by Eric Milkie [ 04/Nov/14 ] |
|
How about you just print the message once, and then do not print it again until after a subsequent successful connection? |
| Comment by Matt Dannenberg [ 04/Nov/14 ] |
|
The trouble stems from bgsync's producer thread's oplogreader failing to find a sync source to connect to. It will loop, sleeping for one second between checking for a member to sync from, until another operation occurs on the primary. The two solutions seem to be either change the sleep time in BackgroundSync::produce for the _syncSourceReader.getHost().empty() branch or to change sync source selection to permit members who are at least as far along as we are rather than only members who are ahead of us. Neither solution seems ideal. Thoughts? |