Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-24229

Slave after restart goes to secondary state even if is out of sync.

    XMLWordPrintable

Details

    • Bug
    • Status: Closed
    • Major - P3
    • Resolution: Duplicate
    • 3.0.9, 3.0.12
    • None
    • Replication
    • None
    • ALL
    • Hide

      Stop a replica member for some time eg. 1h assuming you got enough of oplog for the member to caught up.

      start replica member and check logs, along with mongo cli.

      Show
      Stop a replica member for some time eg. 1h assuming you got enough of oplog for the member to caught up. start replica member and check logs, along with mongo cli.

    Description

      Hi,

      I have found that when a member joins into replicaset after period of inactivity , eg. maintenance downtime on a server that hosts database, it goes very guickly from RECOVERY to SECONDARY state even if data are not fully synced. That resulted in inconsistent results from reads. We had to restrict any non replication traffic with iptables until member have caught up with replication.

      eg. logs on startup of a member

      2016-05-20T13:13:51.566+0000 I REPL     [ReplicationExecutor] This node is node2.example.com:27017 in the config
      2016-05-20T13:13:51.566+0000 I REPL     [ReplicationExecutor] transition to STARTUP2
      2016-05-20T13:13:51.566+0000 I REPL     [ReplicationExecutor] Starting replication applier threads
      2016-05-20T13:13:51.566+0000 I REPL     [ReplicationExecutor] transition to RECOVERING
      2016-05-20T13:13:51.568+0000 I REPL     [ReplicationExecutor] transition to SECONDARY
      2016-05-20T13:13:51.568+0000 I REPL     [ReplicationExecutor] Member node3.example.com:27017 is now in state SECONDARY
      2016-05-20T13:13:51.568+0000 I REPL     [ReplicationExecutor] Member node1.example.com:27017 is now in state PRIMARY
      2016-05-20T13:13:51.568+0000 I REPL     [ReplicationExecutor] Member arb1.example.com:27017 is now in state ARBITER
      

      and replication info:

      node2(mongod-3.0.12)[SECONDARY:bb] test> db.getReplicationInfo()
      {
        "logSizeMB": 1307.89453125,
        "usedMB": 1307.9,
        "timeDiff": 8241,
        "timeDiffHours": 2.29,
        "tFirst": "Fri May 20 2016 09:58:16 GMT+0000 (UTC)",
        "tLast": "Fri May 20 2016 12:15:37 GMT+0000 (UTC)",
        "now": "Fri May 20 2016 13:17:06 GMT+0000 (UTC)"
      }
      

      We have the same behaviour on 3.0.9 and 3.0.12 , tested on two different databases.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              bartosz.debski Bartosz Debski
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: