Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-26271

UnrecoverableRollbackError after election

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Duplicate
    • Affects Version/s: 3.2.9
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None
    • Operating System:
      ALL

      Description

      Hello,

      Last night the primary node stepped down for reason I don't know and a secondary became a primary. The former primary immediately displayed this error:

      2016-09-23T04:10:09.012+0200 I NETWORK  [conn4831] end connection 10.0.0.10:40780 (3 connections now open)
      2016-09-23T04:10:09.349+0200 I REPL     [ReplicationExecutor] syncing from: mongodb-replica2:27017
      2016-09-23T04:10:09.353+0200 I REPL     [SyncSourceFeedback] setting syncSourceFeedback to mongodb-replica2:27017
      2016-09-23T04:10:09.355+0200 I REPL     [rsBackgroundSync] Starting rollback due to OplogStartMissing: our last op time fetched: (term: 238, timestamp: Sep 23 04:10:05:b7). source's GTE: (term: 239, timestamp: Sep 23 04:10:06:1) hashes: (298476498444083566/5675657933368738701)
      2016-09-23T04:10:09.356+0200 I -        [rsBackgroundSync] Fatal assertion 18750 UnrecoverableRollbackError: need to rollback, but in inconsistent state. minvalid: (term: 239, timestamp: Sep 23 04:10:06:1) > our last optime: (term: 238, timestamp: Sep 23 04:10:05:b7)
      2016-09-23T04:10:09.356+0200 I -        [rsBackgroundSync]
       
      ***aborting after fassert() failure
      

      I tried this morning to start mongod on former primary but the same error appeared after few seconds.

      I had this problem before but it was due to ungraceful shutdown and bad RAID array. But in this replica set, each node is on a RAID 10 SSD with lots of memory and cpu. Journaling is enabled on both mongod and file system. And again there was no ungraceful shutdown it just happened during the election even though the primary had the highest priority.

      Also, I just finished a 2-day mongod --repair on the primary before this incident and I did rsync the disk image to other secondaries. So they were all identical with no corruption.

      I attached logs, diagnostics and WiredTiger file for the former primary and a new primary. The incident happened at "2016-09-23T04:10".

      Please let me know if you need anything else.

      rs0:PRIMARY> rs.conf()
      {
              "_id" : "rs0",
              "version" : 468974,
              "protocolVersion" : NumberLong(1),
              "members" : [
                      {
                              "_id" : 4,
                              "host" : "mongodb-replica1:27017",
                              "arbiterOnly" : false,
                              "buildIndexes" : true,
                              "hidden" : false,
                              "priority" : 3,
                              "tags" : {
       
                              },
                              "slaveDelay" : NumberLong(0),
                              "votes" : 1
                      },
                      {
                              "_id" : 5,
                              "host" : "mongodb-replica2:27017",
                              "arbiterOnly" : false,
                              "buildIndexes" : true,
                              "hidden" : false,
                              "priority" : 1,
                              "tags" : {
       
                              },
                              "slaveDelay" : NumberLong(0),
                              "votes" : 1
                      },
                      {
                              "_id" : 6,
                              "host" : "mongodb-replica3:27017",
                              "arbiterOnly" : false,
                              "buildIndexes" : true,
                              "hidden" : false,
                              "priority" : 1,
                              "tags" : {
       
                              },
                              "slaveDelay" : NumberLong(0),
                              "votes" : 1
                      }
              ],
              "settings" : {
                      "chainingAllowed" : true,
                      "heartbeatIntervalMillis" : 2000,
                      "heartbeatTimeoutSecs" : 10,
                      "electionTimeoutMillis" : 10000,
                      "getLastErrorModes" : {
       
                      },
                      "getLastErrorDefaults" : {
                              "w" : 1,
                              "wtimeout" : 0
                      }
              }
      }
      

      Log from new primary:

      2016-09-23T04:10:05.189+0200 I REPL     [ReplicationExecutor] Starting an election, since we've seen no PRIMARY in the past 10000ms
      2016-09-23T04:10:05.189+0200 I REPL     [ReplicationExecutor] conducting a dry run election to see if we could be elected
      2016-09-23T04:10:05.189+0200 I ASIO     [NetworkInterfaceASIO-Replication-0] Connecting to mongodb-replica1:27017
      2016-09-23T04:10:05.189+0200 I ASIO     [NetworkInterfaceASIO-Replication-0] Connecting to mongodb-replica3:27017
      2016-09-23T04:10:05.601+0200 I REPL     [ReplicationExecutor] could not find member to sync from
      2016-09-23T04:10:05.601+0200 I ASIO     [NetworkInterfaceASIO-Replication-0] Connecting to mongodb-replica1:27017
      2016-09-23T04:10:05.601+0200 I ASIO     [NetworkInterfaceASIO-Replication-0] Connecting to mongodb-replica3:27017
      

        Attachments

        1. mongo-replica1.zip
          93.09 MB
        2. mongo-replica2.zip
          103.19 MB

          Issue Links

            Activity

              People

              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: