[SERVER-21560] On restart, sync source not set until first write Created: 19/Nov/15  Updated: 24/Nov/15  Resolved: 19/Nov/15

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: 3.2.0-rc3
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: James Wahlin Assignee: Unassigned
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to SERVER-21656 Improve sync source logging when wait... Closed
Operating System: ALL
Steps To Reproduce:
  1. Start a 3 member replica set
  2. Stop all write activity
  3. Either:
    1. Restart one of the secondaries
    2. rs.remove and rs.remove one of the secondaries
  4. The restarted/removed&added secondary will display "could not find member to sync from" in the log file and does not display the "syncingTo" field in rs.status()
Participants:

 Description   

When a secondary replica member is either restarted or removed/added to a replica set it will display the following in the log file:

2015-11-19T11:25:06.306-0500 I REPL     [ReplicationExecutor] could not find member to sync from

It will also show either an error or no key/value for the rs.status() "syncingTo" field.

This state remains until there is a write performed on the primary at which point a sync source is set.

My expectation would be that we can set a sync source regardless of write activity under normal conditions given there is a primary.



 Comments   
Comment by Scott Hernandez (Inactive) [ 19/Nov/15 ]

It isn't that simple, unfortunately.

Go ahead and file a new issue for improving the log message but we are going to need to spend a bit of time looking into all the possible states and if the information is available in places which have all that context, which they don't have now.

Comment by James Wahlin [ 19/Nov/15 ]

On second thought, can we improve the log message we print in these cases? If we know that there are valid members to sync from but are caught up and waiting for write activity maybe something like "X potential sync sources found, waiting for write activity to choose a source" would be more informative than "could not find member to sync from" which could be interpreted as an error condition.

Comment by James Wahlin [ 19/Nov/15 ]

Fair enough. Feel free to close.

Comment by Scott Hernandez (Inactive) [ 19/Nov/15 ]

Elections actually do a write (in PV1, and did in PV0 until this – SERVER-21096).

Comment by James Wahlin [ 19/Nov/15 ]

That is correct Scott. Would you expect different behavior when an election occurs? If an election occurs at a time that all members are at the same oplog point in time, the secondaries will set a sync source rather than wait for new write activity.

Comment by Scott Hernandez (Inactive) [ 19/Nov/15 ]

I believe what you are describing is that the member is caught up and can't choose a new sync source until there is one ahead of it, like when a write comes in. This is expected behavior.

Generated at Thu Feb 08 03:57:43 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.