[SERVER-24229] Slave after restart goes to secondary state even if is out of sync. Created: 20/May/16  Updated: 20/May/16  Resolved: 20/May/16

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: 3.0.9, 3.0.12
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Bartosz Debski Assignee: Unassigned
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
duplicates SERVER-3251 Overflowed replica set members briefl... Closed
Operating System: ALL
Steps To Reproduce:

Stop a replica member for some time eg. 1h assuming you got enough of oplog for the member to caught up.

start replica member and check logs, along with mongo cli.

Participants:

 Description   

Hi,

I have found that when a member joins into replicaset after period of inactivity , eg. maintenance downtime on a server that hosts database, it goes very guickly from RECOVERY to SECONDARY state even if data are not fully synced. That resulted in inconsistent results from reads. We had to restrict any non replication traffic with iptables until member have caught up with replication.

eg. logs on startup of a member

2016-05-20T13:13:51.566+0000 I REPL     [ReplicationExecutor] This node is node2.example.com:27017 in the config
2016-05-20T13:13:51.566+0000 I REPL     [ReplicationExecutor] transition to STARTUP2
2016-05-20T13:13:51.566+0000 I REPL     [ReplicationExecutor] Starting replication applier threads
2016-05-20T13:13:51.566+0000 I REPL     [ReplicationExecutor] transition to RECOVERING
2016-05-20T13:13:51.568+0000 I REPL     [ReplicationExecutor] transition to SECONDARY
2016-05-20T13:13:51.568+0000 I REPL     [ReplicationExecutor] Member node3.example.com:27017 is now in state SECONDARY
2016-05-20T13:13:51.568+0000 I REPL     [ReplicationExecutor] Member node1.example.com:27017 is now in state PRIMARY
2016-05-20T13:13:51.568+0000 I REPL     [ReplicationExecutor] Member arb1.example.com:27017 is now in state ARBITER

and replication info:

node2(mongod-3.0.12)[SECONDARY:bb] test> db.getReplicationInfo()
{
  "logSizeMB": 1307.89453125,
  "usedMB": 1307.9,
  "timeDiff": 8241,
  "timeDiffHours": 2.29,
  "tFirst": "Fri May 20 2016 09:58:16 GMT+0000 (UTC)",
  "tLast": "Fri May 20 2016 12:15:37 GMT+0000 (UTC)",
  "now": "Fri May 20 2016 13:17:06 GMT+0000 (UTC)"
}

We have the same behaviour on 3.0.9 and 3.0.12 , tested on two different databases.



 Comments   
Comment by Eric Milkie [ 20/May/16 ]

Right now, we're actively working on something for 3.4 to support this that involves driver changes. It may slip past 3.4, however.

Comment by Bartosz Debski [ 20/May/16 ]

Fair enough. So will we have any eta on SERVER-12861 ? It has been open for over 2 years now or is this is just staying as it is ? Thanks.

Comment by Eric Milkie [ 20/May/16 ]

It's a duplicate of SERVER-3251 in that the node will go into SECONDARY state before it contacts any other members of the set. It doesn't matter that it eventually goes into RECOVERING state or not after connecting to other replica set members; that part of the logic is unrelated to the problem (or the solution).

Comment by Bartosz Debski [ 20/May/16 ]

I would not agree that's a duplicate of SERVER-3251 as member does not revert to RECOVERY state at all. Member stays in SECONDARY state and that's it. I can confidently say that we are hitting this problem every time member is behind after restart. I have also tested it now on 3.2.6 and behaviour is exactly the same. As SERVER-12861 looks like a promising solution it does not have an ETA, which looks like it's not going to happen.

Comment by Eric Milkie [ 20/May/16 ]

While this is indeed the current behavior, it is less than ideal. This ticket is a duplicate of SERVER-3251.
Solving this issue is not that easy, as we would still like nodes in a replica set with no primary to be available for reads.

SERVER-12861 is more likely to solve this issue in the way that you want. This ticket involves preventing reads from secondaries if they are "too stale" (a value that would be configurable).

Comment by Paul Ridgway [ 20/May/16 ]

Additionally we'd like to know, is this expected/desired behaviour, we seem to recall it waiting in STARTUP2 or RECOVERY before, while the oplog caught up, then started serving reads? Perhaps this was from a complete re-sync start but we'd like to clarify?

Thanks

Generated at Thu Feb 08 04:05:44 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.