Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-35649

Nodes removed due to isSelf failure should re-attempt to find themselves

    XMLWordPrintable

    Details

    • Backwards Compatibility:
      Minor Change
    • Backport Requested:
      v4.4
    • Sprint:
      Repl 2020-08-24, Repl 2020-09-07, Repl 2020-09-21, Repl 2020-10-05
    • Case:
    • Linked BF Score:
      34

      Description

      We have some dev/staging environments which are locally hosted in our office building. They are entirely for internal usage, so there uptime isn't critical. We have recently been experiencing power outages that cause all 3 members of the replica set to go down and then when power restores come back up at the same time. This has happened about 5 times now and each time when the replica set comes back up both the primary/secondary end up in the REMOVED status and never recover unless we manually restart one of the mongo processes.

      mongo-dev1 rs.status()

      {
              "state" : 10,
              "stateStr" : "REMOVED",
              "uptime" : 199841,
              "optime" : {
                      "ts" : Timestamp(1529137449, 1),
                      "t" : NumberLong(590)
              },
              "optimeDate" : ISODate("2018-06-16T08:24:09Z"),
              "ok" : 0,
              "errmsg" : "Our replica set config is invalid or we are not a member of it",
              "code" : 93,
              "codeName" : "InvalidReplicaSetConfig"
      }
      

      mongo-dev2 rs.status()

      {
              "state" : 10,
              "stateStr" : "REMOVED",
              "uptime" : 199879,
              "optime" : {
                      "ts" : Timestamp(1529137449, 1),
                      "t" : NumberLong(590)
              },
              "optimeDate" : ISODate("2018-06-16T08:24:09Z"),
              "ok" : 0,
              "errmsg" : "Our replica set config is invalid or we are not a member of it",
              "code" : 93,
              "codeName" : "InvalidReplicaSetConfig"
      }
      
      

      mongo-dev1 show log rs

      2018-06-16T09:10:25.236+0000 I REPL     [replExecDBWorker-0] New replica set config in use: { _id: "dev_cluster1", version: 140719, protocolVersion: 1, members: [ { _id: 0, host: "mongo-dev1.220office.local:27017", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 2.0, tags: {}, slaveDelay: 0, votes: 1 }, { _id: 3, host: "utility-dev1.220office.local:27017", arbiterOnly: true, buildIndexes: true, hidden: false, priority: 1.0, tags: {}, slaveDelay: 0, votes: 1 }, { _id: 4, host: "mongo-dev2.2
      2018-06-16T09:10:25.236+0000 I REPL     [replExecDBWorker-0] transition to REMOVED
      2018-06-17T21:28:09.139+0000 I REPL     [ReplicationExecutor] Member utility-dev1.220office.local:27017 is now in state ARBITER
      

      mongo-dev1 rs.conf()

      {
              "_id" : "dev_cluster1",
              "version" : 140719,
              "protocolVersion" : NumberLong(1),
              "members" : [
                      {
                              "_id" : 0,
                              "host" : "mongo-dev1.220office.local:27017",
                              "arbiterOnly" : false,
                              "buildIndexes" : true,
                              "hidden" : false,
                              "priority" : 2,
                              "tags" : {                        },
                              "slaveDelay" : NumberLong(0),
                              "votes" : 1
                      },
                      {
                              "_id" : 3,
                              "host" : "utility-dev1.220office.local:27017",
                              "arbiterOnly" : true,
                              "buildIndexes" : true,
                              "hidden" : false,
                              "priority" : 1,
                              "tags" : {                        },
                              "slaveDelay" : NumberLong(0),
                              "votes" : 1
                      },
                      {
                              "_id" : 4,
                              "host" : "mongo-dev2.220office.local:27017",
                              "arbiterOnly" : false,
                              "buildIndexes" : true,
                              "hidden" : false,
                              "priority" : 1,
                              "tags" : {                        },
                              "slaveDelay" : NumberLong(0),
                              "votes" : 1
                      }
              ],
              "settings" : {
                      "chainingAllowed" : true,
                      "heartbeatIntervalMillis" : 2000,
                      "heartbeatTimeoutSecs" : 10,
                      "electionTimeoutMillis" : 10000,
                      "catchUpTimeoutMillis" : 60000,
                      "getLastErrorModes" : {                },
                      "getLastErrorDefaults" : {
                              "w" : 1,
                              "wtimeout" : 0
                      }
              }
      }
      
      

      As I read the documentation I can't find much information about the REMOVED status. In our setup, since mongo-dev1 has a priority of 2 and mongo-dev2 has a priority of 1 I would expect mongo-dev1 to be elected as the primary after the reboot.

      Is this a bug or are we doing something wrong. If it's a bug, what is the proper procedure for this when all members of the replica set come online at the same time?

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              jesse A. Jesse Jiryu Davis
              Reporter:
              owenallenaz Owen Allen
              Participants:
              Votes:
              3 Vote for this issue
              Watchers:
              24 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: