[SERVER-21537] chainingAllowed = false not being enforced after rs.stepDown() Created: 18/Nov/15  Updated: 06/Dec/22  Resolved: 14/May/20

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: 3.0.6, 3.0.7
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Ruben Terceno Assignee: Backlog - Replication Team
Resolution: Duplicate Votes: 4
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: Text File rs1.log     Text File rs2.log     Text File rs3.log    
Issue Links:
Backports
Duplicate
duplicates SERVER-39621 Disabled chaining should enforce sync... Closed
Related
related to SERVER-25145 During rollback (or w/minvalid invali... Closed
related to SERVER-44603 Consider having tailable readPreferen... Backlog
Assigned Teams:
Replication
Operating System: ALL
Backport Requested:
v3.4, v3.2
Steps To Reproduce:

Setup a RS
Reconfigure it to disallow Replication chaining.

cfg = rs.config()
cfg.settings.chainingAllowed = false
rs.reconfig(cfg)

Step down the primary.
The RS is chaining again.

The issue can be reproduce also if the RS is started having the chainingAllowed parameter = false. The initial configuration is ok, after the StepDown the cluster is chained.

Sprint: Repl E (01/08/16), Repl F (01/29/16), Repl 10 (02/19/16), Repl 11 (03/11/16), Repl 12 (04/01/16), Repl 18 (08/05/16), Repl 2016-08-29
Participants:

 Description   

The parameter chainingAllowed = false is not enforced after a rs.stepDown() command.

The secondaries are chained with the old primary connecting to the new primary directly but the other secondary syncing from the old primary.

Before stepDown()

abc:PRIMARY> rs.status()
{
	"set" : "abc",
	"date" : ISODate("2015-11-18T21:30:38.767Z"),
	"myState" : 1,
	"members" : [
		{
			"_id" : 0,
			"name" : "Rubens-MacBook-Pro.local:27117",
			"health" : 1,
			"state" : 1,
			"stateStr" : "PRIMARY",
			"uptime" : 45,
			"optime" : Timestamp(1447881863, 1),
			"optimeDate" : ISODate("2015-11-18T21:24:23Z"),
			"electionTime" : Timestamp(1447882202, 1),
			"electionDate" : ISODate("2015-11-18T21:30:02Z"),
			"configVersion" : 4,
			"self" : true
		},
		{
			"_id" : 1,
			"name" : "Rubens-MacBook-Pro.local:27118",
			"health" : 1,
			"state" : 2,
			"stateStr" : "SECONDARY",
			"uptime" : 36,
			"optime" : Timestamp(1447881863, 1),
			"optimeDate" : ISODate("2015-11-18T21:24:23Z"),
			"lastHeartbeat" : ISODate("2015-11-18T21:30:38.370Z"),
			"lastHeartbeatRecv" : ISODate("2015-11-18T21:30:38.282Z"),
			"pingMs" : 0,
			"syncingTo" : "Rubens-MacBook-Pro.local:27117",
			"configVersion" : 4
		},
		{
			"_id" : 2,
			"name" : "Rubens-MacBook-Pro.local:27119",
			"health" : 1,
			"state" : 2,
			"stateStr" : "SECONDARY",
			"uptime" : 32,
			"optime" : Timestamp(1447881863, 1),
			"optimeDate" : ISODate("2015-11-18T21:24:23Z"),
			"lastHeartbeat" : ISODate("2015-11-18T21:30:38.724Z"),
			"lastHeartbeatRecv" : ISODate("2015-11-18T21:30:38.685Z"),
			"pingMs" : 8,
			"syncingTo" : "Rubens-MacBook-Pro.local:27117",
			"configVersion" : 4
		}
	],
	"ok" : 1
}

After stepDown()

abc:SECONDARY> rs.status()
{
	"set" : "abc",
	"date" : ISODate("2015-11-18T21:30:51.041Z"),
	"myState" : 2,
	"syncingTo" : "Rubens-MacBook-Pro.local:27119",
	"members" : [
		{
			"_id" : 0,
			"name" : "Rubens-MacBook-Pro.local:27117",
			"health" : 1,
			"state" : 2,
			"stateStr" : "SECONDARY",
			"uptime" : 58,
			"optime" : Timestamp(1447881863, 1),
			"optimeDate" : ISODate("2015-11-18T21:24:23Z"),
			"syncingTo" : "Rubens-MacBook-Pro.local:27119",
			"configVersion" : 4,
			"self" : true
		},
		{
			"_id" : 1,
			"name" : "Rubens-MacBook-Pro.local:27118",
			"health" : 1,
			"state" : 2,
			"stateStr" : "SECONDARY",
			"uptime" : 48,
			"optime" : Timestamp(1447881863, 1),
			"optimeDate" : ISODate("2015-11-18T21:24:23Z"),
			"lastHeartbeat" : ISODate("2015-11-18T21:30:50.383Z"),
			"lastHeartbeatRecv" : ISODate("2015-11-18T21:30:50.302Z"),
			"pingMs" : 0,
			"syncingTo" : "Rubens-MacBook-Pro.local:27117",
			"configVersion" : 4
		},
		{
			"_id" : 2,
			"name" : "Rubens-MacBook-Pro.local:27119",
			"health" : 1,
			"state" : 1,
			"stateStr" : "PRIMARY",
			"uptime" : 44,
			"optime" : Timestamp(1447881863, 1),
			"optimeDate" : ISODate("2015-11-18T21:24:23Z"),
			"lastHeartbeat" : ISODate("2015-11-18T21:30:50.744Z"),
			"lastHeartbeatRecv" : ISODate("2015-11-18T21:30:50.700Z"),
			"pingMs" : 1,
			"electionTime" : Timestamp(1447882246, 1),
			"electionDate" : ISODate("2015-11-18T21:30:46Z"),
			"configVersion" : 4
		}
	],
	"ok" : 1
}

This issue can't be reproduced on 3.2.0-rc2



 Comments   
Comment by Siyuan Zhou [ 07/May/20 ]

Closing this as a dup of SERVER-39621. We deprecated PV0 and have more information in replication metadata now, so the solution is simpler.

Comment by Siyuan Zhou [ 02/Mar/16 ]

ramon.fernandez, Thomas mentioned that he had reproduce this issue on 3.0.7 and this issue is found on 3.0 by users. Do you mean the fix only applies to 3.0 so we don't "backport" it?

Comment by Ramon Fernandez Marina [ 24/Feb/16 ]

Removing the v3.0 backport for this ticket, I think it stayed there after some earlier confusion with fixversions.

Comment by Eric Milkie [ 01/Feb/16 ]

Since the primary disconnects all connections when stepping down, I don't see how any node can keep syncing from it. Perhaps the connection is flagged not to be disconnected?

Comment by Siyuan Zhou [ 01/Feb/16 ]

_rsConfig.isChainingAllowed() is only checked in TopologyCoordinatorImpl::chooseNewSyncSource(), so if the secondaries don't have to choose a new sync source, they will keep syncing from the old upstream. We can choose a new sync source when we see a new primary. Alternatively, we can do that whenever we get a new term and make sure the upstream is also in the latest term.

Comment by Kelsey Schubert [ 18/Nov/15 ]

I have reproduced on 3.0.7 - sending to triage.

Generated at Thu Feb 08 03:57:39 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.