[SERVER-6475] replSetSyncFrom can break replication Created: 17/Jul/12  Updated: 23/Feb/15  Resolved: 23/Feb/15

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: 2.1.2
Fix Version/s: None

Type: Bug Priority: Critical - P2
Reporter: Bernie Hackett Assignee: Unassigned
Resolution: Won't Fix Votes: 0
Labels: sync
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Operating System: ALL
Participants:

 Description   

It is possible to configure two secondaries to sync from each other, breaking replication for both:

MongoDB shell version: 2.1.3-pre-
connecting to: 127.0.0.1:27018/test
foo:SECONDARY> rs.syncFrom('behackett-dt:27019')
{
	"syncFromRequested" : "behackett-dt:27019",
	"prevSyncTarget" : "behackett-dt:27017",
	"ok" : 1
}
foo:SECONDARY> rs.status()
{
	"set" : "foo",
	"date" : ISODate("2012-07-17T00:23:47Z"),
	"myState" : 2,
	"syncingTo" : "behackett-dt:27019",
	"members" : [
		{
			"_id" : 0,
			"name" : "behackett-dt:27017",
			"health" : 1,
			"state" : 1,
			"stateStr" : "PRIMARY",
			"uptime" : 131,
			"optime" : Timestamp(1342483505000, 1),
			"optimeDate" : ISODate("2012-07-17T00:05:05Z"),
			"lastHeartbeat" : ISODate("2012-07-17T00:23:46Z"),
			"pingMs" : 0
		},
		{
			"_id" : 1,
			"name" : "behackett-dt:27018",
			"health" : 1,
			"state" : 2,
			"stateStr" : "SECONDARY",
			"uptime" : 131,
			"optime" : Timestamp(1342483505000, 1),
			"optimeDate" : ISODate("2012-07-17T00:05:05Z"),
			"errmsg" : "syncing to: behackett-dt:27019 by request",
			"self" : true
		},
		{
			"_id" : 2,
			"name" : "behackett-dt:27019",
			"health" : 1,
			"state" : 2,
			"stateStr" : "SECONDARY",
			"uptime" : 131,
			"optime" : Timestamp(1342483505000, 1),
			"optimeDate" : ISODate("2012-07-17T00:05:05Z"),
			"lastHeartbeat" : ISODate("2012-07-17T00:23:46Z"),
			"pingMs" : 0,
			"errmsg" : "syncing to: behackett-dt:27018 by request"
		},
		{
			"_id" : 3,
			"name" : "behackett-dt:27020",
			"health" : 1,
			"state" : 7,
			"stateStr" : "ARBITER",
			"uptime" : 129,
			"lastHeartbeat" : ISODate("2012-07-17T00:23:46Z"),
			"pingMs" : 0
		}
	],
	"ok" : 1
}

Subsequent write operations are not replicated to either:

foo:SECONDARY> rs.status()
{
	"set" : "foo",
	"date" : ISODate("2012-07-17T00:26:06Z"),
	"myState" : 2,
	"syncingTo" : "behackett-dt:27019",
	"members" : [
		{
			"_id" : 0,
			"name" : "behackett-dt:27017",
			"health" : 1,
			"state" : 1,
			"stateStr" : "PRIMARY",
			"uptime" : 270,
			"optime" : Timestamp(1342484759000, 2),
			"optimeDate" : ISODate("2012-07-17T00:25:59Z"),
			"lastHeartbeat" : ISODate("2012-07-17T00:26:04Z"),
			"pingMs" : 0
		},
		{
			"_id" : 1,
			"name" : "behackett-dt:27018",
			"health" : 1,
			"state" : 2,
			"stateStr" : "SECONDARY",
			"uptime" : 270,
			"optime" : Timestamp(1342483505000, 1),
			"optimeDate" : ISODate("2012-07-17T00:05:05Z"),
			"self" : true
		},
		{
			"_id" : 2,
			"name" : "behackett-dt:27019",
			"health" : 1,
			"state" : 2,
			"stateStr" : "SECONDARY",
			"uptime" : 270,
			"optime" : Timestamp(1342483505000, 1),
			"optimeDate" : ISODate("2012-07-17T00:05:05Z"),
			"lastHeartbeat" : ISODate("2012-07-17T00:26:04Z"),
			"pingMs" : 0
		},
		{
			"_id" : 3,
			"name" : "behackett-dt:27020",
			"health" : 1,
			"state" : 7,
			"stateStr" : "ARBITER",
			"uptime" : 268,
			"lastHeartbeat" : ISODate("2012-07-17T00:26:04Z"),
			"pingMs" : 0
		}
	],
	"ok" : 1
}
foo:SECONDARY> db.printSlaveReplicationInfo()
source:   behackett-dt:27018
	 syncedTo: Mon Jul 16 2012 17:05:05 GMT-0700 (PDT)
		 = 1505 secs ago (0.42hrs)
source:   behackett-dt:27019
	 syncedTo: Mon Jul 16 2012 17:05:05 GMT-0700 (PDT)
		 = 1505 secs ago (0.42hrs)
source:   behackett-dt:27020
	 no replication info, yet.  State: ARBITER

Secondaries should check that the requested source isn't already syncing from themselves.



 Comments   
Comment by Eric Milkie [ 23/Feb/15 ]

There is no requirement that a cluster have a primary when replSetSyncFrom is run, so it would be hard to tell if there is a possible path back to a primary.
If a user introduces a cycle with replSetSyncFrom, it is easy to fix on a live system.

Comment by Bernie Hackett [ 30/Aug/12 ]

True, but doing this simple check will save us a lot of trouble. How hard would it be, given that replica sets are limited to 12 members, to actually check for a path back to the primary?

Comment by Kristina Chodorow (Inactive) [ 30/Aug/12 ]

Theoretically, once we checked for this, you could do 3 members that are syncing in a circle.

Generated at Thu Feb 08 03:11:47 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.