Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Duplicate
Priority: Major - P3
Fix Version/s: None
Affects Version/s: 3.0.9, 3.0.12
Component/s: Replication
Labels:
None

Operating System:
ALL
Steps To Reproduce:

Hide

Stop a replica member for some time eg. 1h assuming you got enough of oplog for the member to caught up.

start replica member and check logs, along with mongo cli.

Show
Stop a replica member for some time eg. 1h assuming you got enough of oplog for the member to caught up. start replica member and check logs, along with mongo cli.
Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

Hi,

I have found that when a member joins into replicaset after period of inactivity , eg. maintenance downtime on a server that hosts database, it goes very guickly from RECOVERY to SECONDARY state even if data are not fully synced. That resulted in inconsistent results from reads. We had to restrict any non replication traffic with iptables until member have caught up with replication.

eg. logs on startup of a member

2016-05-20T13:13:51.566+0000 I REPL     [ReplicationExecutor] This node is node2.example.com:27017 in the config
2016-05-20T13:13:51.566+0000 I REPL     [ReplicationExecutor] transition to STARTUP2
2016-05-20T13:13:51.566+0000 I REPL     [ReplicationExecutor] Starting replication applier threads
2016-05-20T13:13:51.566+0000 I REPL     [ReplicationExecutor] transition to RECOVERING
2016-05-20T13:13:51.568+0000 I REPL     [ReplicationExecutor] transition to SECONDARY
2016-05-20T13:13:51.568+0000 I REPL     [ReplicationExecutor] Member node3.example.com:27017 is now in state SECONDARY
2016-05-20T13:13:51.568+0000 I REPL     [ReplicationExecutor] Member node1.example.com:27017 is now in state PRIMARY
2016-05-20T13:13:51.568+0000 I REPL     [ReplicationExecutor] Member arb1.example.com:27017 is now in state ARBITER

and replication info:

node2(mongod-3.0.12)[SECONDARY:bb] test> db.getReplicationInfo()
{
  "logSizeMB": 1307.89453125,
  "usedMB": 1307.9,
  "timeDiff": 8241,
  "timeDiffHours": 2.29,
  "tFirst": "Fri May 20 2016 09:58:16 GMT+0000 (UTC)",
  "tLast": "Fri May 20 2016 12:15:37 GMT+0000 (UTC)",
  "now": "Fri May 20 2016 13:17:06 GMT+0000 (UTC)"
}

We have the same behaviour on 3.0.9 and 3.0.12 , tested on two different databases.

duplicates

SERVER-3251 Overflowed replica set members briefly re-initialize as SECONDARY

Closed

Assignee:: Unassigned
Reporter:: Bartosz Debski
Participants:: Bartosz Debski, Eric Milkie, Paul Ridgway
Votes:: 0 Vote for this issue
Watchers:: 7 Start watching this issue

Created:: May 20 2016 01:46:34 PM UTC
Updated:: May 20 2016 05:05:06 PM UTC
Resolved:: May 20 2016 05:05:06 PM UTC

Details

Description

Attachments

Issue Links

Forms

Activity

People

Dates