Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-24229

Slave after restart goes to secondary state even if is out of sync.

    • Type: Icon: Bug Bug
    • Resolution: Duplicate
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 3.0.9, 3.0.12
    • Component/s: Replication
    • Labels:
      None
    • ALL
    • Hide

      Stop a replica member for some time eg. 1h assuming you got enough of oplog for the member to caught up.

      start replica member and check logs, along with mongo cli.

      Show
      Stop a replica member for some time eg. 1h assuming you got enough of oplog for the member to caught up. start replica member and check logs, along with mongo cli.

      Hi,

      I have found that when a member joins into replicaset after period of inactivity , eg. maintenance downtime on a server that hosts database, it goes very guickly from RECOVERY to SECONDARY state even if data are not fully synced. That resulted in inconsistent results from reads. We had to restrict any non replication traffic with iptables until member have caught up with replication.

      eg. logs on startup of a member

      2016-05-20T13:13:51.566+0000 I REPL     [ReplicationExecutor] This node is node2.example.com:27017 in the config
      2016-05-20T13:13:51.566+0000 I REPL     [ReplicationExecutor] transition to STARTUP2
      2016-05-20T13:13:51.566+0000 I REPL     [ReplicationExecutor] Starting replication applier threads
      2016-05-20T13:13:51.566+0000 I REPL     [ReplicationExecutor] transition to RECOVERING
      2016-05-20T13:13:51.568+0000 I REPL     [ReplicationExecutor] transition to SECONDARY
      2016-05-20T13:13:51.568+0000 I REPL     [ReplicationExecutor] Member node3.example.com:27017 is now in state SECONDARY
      2016-05-20T13:13:51.568+0000 I REPL     [ReplicationExecutor] Member node1.example.com:27017 is now in state PRIMARY
      2016-05-20T13:13:51.568+0000 I REPL     [ReplicationExecutor] Member arb1.example.com:27017 is now in state ARBITER
      

      and replication info:

      node2(mongod-3.0.12)[SECONDARY:bb] test> db.getReplicationInfo()
      {
        "logSizeMB": 1307.89453125,
        "usedMB": 1307.9,
        "timeDiff": 8241,
        "timeDiffHours": 2.29,
        "tFirst": "Fri May 20 2016 09:58:16 GMT+0000 (UTC)",
        "tLast": "Fri May 20 2016 12:15:37 GMT+0000 (UTC)",
        "now": "Fri May 20 2016 13:17:06 GMT+0000 (UTC)"
      }
      

      We have the same behaviour on 3.0.9 and 3.0.12 , tested on two different databases.

            Assignee:
            Unassigned Unassigned
            Reporter:
            bartosz.debski Bartosz Debski
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

              Created:
              Updated:
              Resolved: