[SERVER-15931] Repeated "[ReplicationExecutor] could not find member to sync from" in healthy replica set Created: 04/Nov/14  Updated: 24/Nov/14  Resolved: 13/Nov/14

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: 2.8.0-rc0
Fix Version/s: 2.8.0-rc1

Type: Bug Priority: Minor - P4
Reporter: Ramon Fernandez Marina Assignee: Matt Dannenberg
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
is related to SERVER-14809 crash in replicaset when running resy... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Steps To Reproduce:

See SERVER-14809

Participants:

 Description   

While working on SERVER-14809 I saw repeated messages like this in the logs:

2014-11-03T16:18:24.424-0500 I REPLSETS [ReplicationExecutor] could not find member to sync from
2014-11-03T16:18:25.424-0500 I REPLSETS [ReplicationExecutor] could not find member to sync from
2014-11-03T16:18:26.425-0500 I REPLSETS [ReplicationExecutor] could not find member to sync from
2014-11-03T16:18:27.425-0500 I REPLSETS [ReplicationExecutor] could not find member to sync from
2014-11-03T16:18:28.425-0500 I REPLSETS [ReplicationExecutor] could not find member to sync from
2014-11-03T16:18:29.425-0500 I REPLSETS [ReplicationExecutor] could not find member to sync from
2014-11-03T16:18:30.425-0500 I REPLSETS [ReplicationExecutor] could not find member to sync from

These messages just add noise, but would be nice to fix them at some point.



 Comments   
Comment by Githook User [ 13/Nov/14 ]

Author:

{u'username': u'dannenberg', u'name': u'matt dannenberg', u'email': u'matt.dannenberg@10gen.com'}

Message: SERVER-15931 print "could not find member to sync from" only once between successful detections
Branch: master
https://github.com/mongodb/mongo/commit/9a9b844828cdba0689484a6c18d9f04e8b5f950c

Comment by Eric Milkie [ 04/Nov/14 ]

How about you just print the message once, and then do not print it again until after a subsequent successful connection?

Comment by Matt Dannenberg [ 04/Nov/14 ]

The trouble stems from bgsync's producer thread's oplogreader failing to find a sync source to connect to. It will loop, sleeping for one second between checking for a member to sync from, until another operation occurs on the primary. The two solutions seem to be either change the sleep time in BackgroundSync::produce for the _syncSourceReader.getHost().empty() branch or to change sync source selection to permit members who are at least as far along as we are rather than only members who are ahead of us. Neither solution seems ideal. Thoughts?

Generated at Thu Feb 08 03:39:26 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.