[SERVER-21816] Cannot initial sync from non-voting node Created: 09/Dec/15  Updated: 04/Jan/16  Resolved: 22/Dec/15

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: 3.2.0
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Mathias Stearn Assignee: Scott Hernandez (Inactive)
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
duplicates SERVER-21971 Not possible to elect a primary if no... Closed
Related
Backwards Compatibility: Fully Compatible
Operating System: ALL
Participants:

 Description   

This is a problem if there are two nodes in a cold backup configuration. Config:

{
	"_id" : "rs",
	"protocolVersion" : NumberLong(1),
	"members" : [
		{
			"_id" : 0,
			"host" : "localhost:27017",
			"arbiterOnly" : false,
			"buildIndexes" : true,
			"hidden" : false,
			"priority" : 1,
			"tags" : {
				
			},
			"slaveDelay" : NumberLong(0),
			"votes" : 1
		},
		{
			"_id" : 1,
			"host" : "localhost:30000",
			"arbiterOnly" : false,
			"buildIndexes" : true,
			"hidden" : false,
			"priority" : 0,
			"tags" : {
				
			},
			"slaveDelay" : NumberLong(0),
			"votes" : 0
		}
	],
	"settings" : {
		"chainingAllowed" : true,
		"heartbeatIntervalMillis" : 2000,
		"heartbeatTimeoutSecs" : 10,
		"electionTimeoutMillis" : 10000,
		"getLastErrorModes" : {
			
		},
		"getLastErrorDefaults" : {
			"w" : 1,
			"wtimeout" : 0
		}
	},
}

I wiped out the dbpath on the primary while the scondary was running. When I restarted the primary (expecting an initial sync to restore the data), the log just repeats the following lines:

2015-12-09T13:10:38.066-0500 I REPL     [replExecDBWorker-0] Starting replication applier threads
2015-12-09T13:10:38.066-0500 W REPL     [rsSync] did not receive a valid config yet
2015-12-09T13:10:38.066-0500 I REPL     [ReplicationExecutor] New replica set config in use: { _id: "rs", version: 3, protocolVersion: 1, members: [ { _id: 0, host: "localhost:27017", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 1.0, tags: {}, slaveDelay: 0, votes: 1 }, { _id: 1, host: "localhost:30000", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 0.0, tags: {}, slaveDelay: 0, votes: 0 } ], settings: { chainingAllowed: true, heartbeatIntervalMillis: 2000, heartbeatTimeoutSecs: 10, electionTimeoutMillis: 10000, getLastErrorModes: {}, getLastErrorDefaults: { w: 1, wtimeout: 0 } } }
2015-12-09T13:10:38.066-0500 I REPL     [ReplicationExecutor] This node is localhost:27017 in the config
2015-12-09T13:10:38.066-0500 I REPL     [ReplicationExecutor] transition to STARTUP2
2015-12-09T13:10:38.067-0500 I REPL     [ReplicationExecutor] Member localhost:30000 is now in state SECONDARY
2015-12-09T13:10:39.066-0500 I REPL     [rsSync] ******
2015-12-09T13:10:39.066-0500 I REPL     [rsSync] creating replication oplog of size: 3378MB...
2015-12-09T13:10:39.070-0500 I STORAGE  [rsSync] Starting WiredTigerRecordStoreThread local.oplog.rs
2015-12-09T13:10:39.070-0500 I STORAGE  [rsSync] The size storer reports that the oplog contains 0 records totaling to 0 bytes
2015-12-09T13:10:39.070-0500 I STORAGE  [rsSync] Scanning the oplog to determine where to place markers for truncation
2015-12-09T13:10:39.088-0500 I REPL     [rsSync] ******
2015-12-09T13:10:39.088-0500 I REPL     [rsSync] initial sync pending
2015-12-09T13:10:39.092-0500 I REPL     [rsSync] no valid sync sources found in current replset to do an initial sync
2015-12-09T13:10:40.092-0500 I REPL     [rsSync] initial sync pending
2015-12-09T13:10:40.092-0500 I REPL     [rsSync] no valid sync sources found in current replset to do an initial sync
2015-12-09T13:10:41.092-0500 I REPL     [rsSync] initial sync pending
2015-12-09T13:10:41.093-0500 I REPL     [rsSync] no valid sync sources found in current replset to do an initial sync
2015-12-09T13:10:42.093-0500 I REPL     [rsSync] initial sync pending
2015-12-09T13:10:42.093-0500 I REPL     [rsSync] no valid sync sources found in current replset to do an initial sync
...



 Comments   
Comment by Scott Hernandez (Inactive) [ 22/Dec/15 ]

Fixed in SERVER-21971.

Comment by Scott Hernandez (Inactive) [ 09/Dec/15 ]

This is due to the code in topology coordinator wrt source selection.

We should be less stringent during initial sync.

Generated at Thu Feb 08 03:58:31 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.