Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Gone away
Priority: Major - P3
Fix Version/s: None
Affects Version/s: 3.2.11
Component/s: Replication
Labels:
None

Operating System:
ALL
Steps To Reproduce:
Hide

I couldn't find a deterministic set of steps to trigger the problem. However, following the steps below I was able to reproduce the problem with a good chance of success. The issue seems to happen when the server is in heavy load and we are "lucky" and the mongo with least priority ends up being primary after a step down. All scripts I used are attached to this issue.

Start the mongo servers:

$ mongod --config node1.cfg & $ mongod --config node2.cfg & $ mongod --config node3.cfg &

Setup the cluster and constantly step down the primary:

$ mongo rollback-bug.js

That will eventually print the following and exit, meaning the issue happened:

Thu Mar 01 2018 14:04:52 GMT+0000 (GMT) Rollback detected Thu Mar 01 2018 14:04:52 GMT+0000 (GMT) [ { "_id" : 2, "name" : "127.0.0.1:27019", "health" : 1, "state" : 9, "stateStr" : "ROLLBACK", "uptime" : 208, "optime" : { "ts" : Timestamp(1519913083, 423), "t" : NumberLong(3) }, "optimeDate" : ISODate("2018-03-01T14:04:43Z"), "lastHeartbeat" : ISODate("2018-03-01T14:04:51.817Z"), "lastHeartbeatRecv" : ISODate("2018-03-01T14:04:49.718Z"), "pingMs" : NumberLong(2), "syncingTo" : "127.0.0.1:27018", "configVersion" : 1 } ]

After a primary is elected, I ran the write_forever.py script (it assumes you have pymongo in your python path). All the script does is to start a few processes and constantly insert documents.

$ python write_forever.py

I'm afraid you may need to tune the number of processes and document size in write_forever.py depending on your settings.
Show
I couldn't find a deterministic set of steps to trigger the problem. However, following the steps below I was able to reproduce the problem with a good chance of success. The issue seems to happen when the server is in heavy load and we are "lucky" and the mongo with least priority ends up being primary after a step down. All scripts I used are attached to this issue. Start the mongo servers: $ mongod --config node1.cfg & $ mongod --config node2.cfg & $ mongod --config node3.cfg & Setup the cluster and constantly step down the primary: $ mongo rollback-bug.js That will eventually print the following and exit, meaning the issue happened: Thu Mar 01 2018 14:04:52 GMT+0000 (GMT) Rollback detected Thu Mar 01 2018 14:04:52 GMT+0000 (GMT) [ { "_id" : 2, "name" : "127.0.0.1:27019" , "health" : 1, "state" : 9, "stateStr" : "ROLLBACK" , "uptime" : 208, "optime" : { "ts" : Timestamp(1519913083, 423), "t" : NumberLong(3) }, "optimeDate" : ISODate( "2018-03-01T14:04:43Z" ), "lastHeartbeat" : ISODate( "2018-03-01T14:04:51.817Z" ), "lastHeartbeatRecv" : ISODate( "2018-03-01T14:04:49.718Z" ), "pingMs" : NumberLong(2), "syncingTo" : "127.0.0.1:27018" , "configVersion" : 1 } ] After a primary is elected, I ran the write_forever.py script (it assumes you have pymongo in your python path). All the script does is to start a few processes and constantly insert documents. $ python write_forever.py I'm afraid you may need to tune the number of processes and document size in write_forever.py depending on your settings.
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

I got a rollback after stepping down a primary in a mongo cluster with 3 nodes. From mongo's documentation:

The method then initiates a catchup period where it waits up to secondaryCatchUpPeriodSeconds, by default 10 seconds, for a secondary to become up-to-date with the primary. The primary only steps down if a secondary is up-to-date with the primary during the catchup period to prevent rollbacks.

No networking issues were observed during that time. So, I wouldn't expect a rollback to happen.

It seems the rollback happened due to a primary election, which happened right after the step down (in this particular cluster the nodes have different priorities). However, I don't see anything in the documentation saying that primary elections may cause rollbacks either. In fact, I was not able to reproduce the problem only doing step downs or only swapping priorities.

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

mongo-3.2.11-logs.tar.gz
39 kB
Mar 02 2018 05:04:31 PM UTC
mongo-3.4.13-logs.tar.gz
3.26 MB
Mar 02 2018 05:05:02 PM UTC
node1.cfg
0.2 kB
Mar 01 2018 03:44:21 PM UTC
node2.cfg
0.2 kB
Mar 01 2018 03:44:21 PM UTC
node3.cfg
0.2 kB
Mar 01 2018 03:44:21 PM UTC
rollback-bug.js
1 kB
Mar 01 2018 03:44:21 PM UTC
write_forever.py
0.6 kB
Mar 01 2018 03:44:21 PM UTC

Assignee:: Kelsey Schubert
Reporter:: Rafael Cunha de Almeida
Participants:: Kelsey Schubert, Rafael Cunha de Almeida
Votes:: 0 Vote for this issue
Watchers:: 9 Start watching this issue

Created:: Mar 01 2018 04:45:23 PM UTC
Updated:: Oct 27 2023 08:43:36 PM UTC
Resolved:: Mar 06 2018 06:54:39 AM UTC

Details

Description

Attachments

Attachments

Activity

People

Dates