[SERVER-33571] Running stepDown may trigger a rollback Created: 01/Mar/18  Updated: 27/Oct/23  Resolved: 06/Mar/18

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: 3.2.11
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Rafael Cunha de Almeida Assignee: Kelsey Schubert
Resolution: Gone away Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File mongo-3.2.11-logs.tar.gz     File mongo-3.4.13-logs.tar.gz     File node1.cfg     File node2.cfg     File node3.cfg     File rollback-bug.js     File write_forever.py    
Issue Links:
Related
Operating System: ALL
Steps To Reproduce:

I couldn't find a deterministic set of steps to trigger the problem. However, following the steps below I was able to reproduce the problem with a good chance of success. The issue seems to happen when the server is in heavy load and we are "lucky" and the mongo with least priority ends up being primary after a step down. All scripts I used are attached to this issue.

Start the mongo servers:

$ mongod --config node1.cfg &
$ mongod --config node2.cfg &
$ mongod --config node3.cfg &

Setup the cluster and constantly step down the primary:

$ mongo rollback-bug.js

That will eventually print the following and exit, meaning the issue happened:

Thu Mar 01 2018 14:04:52 GMT+0000 (GMT) Rollback detected
Thu Mar 01 2018 14:04:52 GMT+0000 (GMT)
[
	{
		"_id" : 2,
		"name" : "127.0.0.1:27019",
		"health" : 1,
		"state" : 9,
		"stateStr" : "ROLLBACK",
		"uptime" : 208,
		"optime" : {
			"ts" : Timestamp(1519913083, 423),
			"t" : NumberLong(3)
		},
		"optimeDate" : ISODate("2018-03-01T14:04:43Z"),
		"lastHeartbeat" : ISODate("2018-03-01T14:04:51.817Z"),
		"lastHeartbeatRecv" : ISODate("2018-03-01T14:04:49.718Z"),
		"pingMs" : NumberLong(2),
		"syncingTo" : "127.0.0.1:27018",
		"configVersion" : 1
	}
]

After a primary is elected, I ran the write_forever.py script (it assumes you have pymongo in your python path). All the script does is to start a few processes and constantly insert documents.

$ python write_forever.py

I'm afraid you may need to tune the number of processes and document size in write_forever.py depending on your settings.

Participants:

 Description   

I got a rollback after stepping down a primary in a mongo cluster with 3 nodes. From mongo's documentation:

The method then initiates a catchup period where it waits up to secondaryCatchUpPeriodSeconds, by default 10 seconds, for a secondary to become up-to-date with the primary. The primary only steps down if a secondary is up-to-date with the primary during the catchup period to prevent rollbacks.

No networking issues were observed during that time. So, I wouldn't expect a rollback to happen.

It seems the rollback happened due to a primary election, which happened right after the step down (in this particular cluster the nodes have different priorities). However, I don't see anything in the documentation saying that primary elections may cause rollbacks either. In fact, I was not able to reproduce the problem only doing step downs or only swapping priorities.



 Comments   
Comment by Kelsey Schubert [ 06/Mar/18 ]

Hi rafael,

Thank for testing against 3.4 and 3.6, I'm glad that you were able to observe the improvements from the work that we've put into preserving w:1 writes during failover/stepdowns. Since you're not able to reproduce this issue on MongoDB 3.6, my advice would to upgrade your cluster. Of course, as you know, the best way to ensure that writes are never rolled back after acknowledgement would be to increase the write concern to w:majority.

Kind regards,
Kelsey

Comment by Rafael Cunha de Almeida [ 02/Mar/18 ]

Hi kelsey.schubert,

I managed to get a rollback on 3.4, although it took me a lot longer and it didn't seem to be the exact same scenario (although it was also triggered by a stepdown). However, I couldn't get the rollback to happen in version 3.6. I used replication protocol 1 for all tests, afaik. This is the output from the 3.2.11 cluster:

rs0:SECONDARY> rs.conf()
{
	"_id" : "rs0",
	"version" : 1,
	"protocolVersion" : NumberLong(1),
	"members" : [
		{
			"_id" : 0,
			"host" : "127.0.0.1:27017",
			"arbiterOnly" : false,
			"buildIndexes" : true,
			"hidden" : false,
			"priority" : 900,
			"tags" : {
				
			},
			"slaveDelay" : NumberLong(0),
			"votes" : 1
		},
		{
			"_id" : 1,
			"host" : "127.0.0.1:27018",
			"arbiterOnly" : false,
			"buildIndexes" : true,
			"hidden" : false,
			"priority" : 500,
			"tags" : {
				
			},
			"slaveDelay" : NumberLong(0),
			"votes" : 1
		},
		{
			"_id" : 2,
			"host" : "127.0.0.1:27019",
			"arbiterOnly" : false,
			"buildIndexes" : true,
			"hidden" : false,
			"priority" : 100,
			"tags" : {
				
			},
			"slaveDelay" : NumberLong(0),
			"votes" : 1
		}
	],
	"settings" : {
		"chainingAllowed" : true,
		"heartbeatIntervalMillis" : 2000,
		"heartbeatTimeoutSecs" : 10,
		"electionTimeoutMillis" : 10000,
		"getLastErrorModes" : {
			
		},
		"getLastErrorDefaults" : {
			"w" : 1,
			"wtimeout" : 0
		},
		"replicaSetId" : ObjectId("5a9981f89c3e7ba5f904d289")
	}
}

I attached the logs for both version 3.2.11 and 3.4.13. For 3.4.13 I ended up making a few changes to write_forever.py, I added the new version to the same tar.gz as the logs for version 3.4.13. mongo-3.2.11-logs.tar.gz mongo-3.4.13-logs.tar.gz

Comment by Kelsey Schubert [ 01/Mar/18 ]

Hi rafael,

Would you please confirm that you're using replication protocol 1? An output of the replica set config would give us the required information.

If so, would you please rerun these tests on the latest version of MongoDB 3.4 or 3.6 as there have been a number of improvements in this space?

Thank you,
Kelsey

Generated at Thu Feb 08 04:33:52 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.