Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-33446

PowerPC rollback failure



    • Question
    • Status: Closed
    • Major - P3
    • Resolution: Incomplete
    • 3.7.2
    • None
    • Replication
    • None


      Tests for the C driver had a failure on PowerPC which look like a mongod failure. I haven't yet been able to reproduce. Looking at the logs we get a secondary unable to rollback.

      The replica set is initiated with following config:

          _id: "repl0",
          version: 1,
          protocolVersion: 1,
          members: [{
              _id: 0,
              host: "localhost:27017",
              arbiterOnly: false,
              buildIndexes: true,
              hidden: false,
              priority: 1.0,
              tags: {
                  ordinal: "one",
                  dc: "ny"
              slaveDelay: 0,
              votes: 1
          }, {
              _id: 1,
              host: "localhost:27018",
              arbiterOnly: false,
              buildIndexes: true,
              hidden: false,
              priority: 1.0,
              tags: {
                  ordinal: "two",
                  dc: "pa"
              slaveDelay: 0,
              votes: 1
          }, {
              _id: 2,
              host: "localhost:27019",
              arbiterOnly: true,
              buildIndexes: true,
              hidden: false,
              priority: 0.0,
              tags: {},
              slaveDelay: 0,
              votes: 1
          settings: {
              chainingAllowed: true,
              heartbeatIntervalMillis: 2000,
              heartbeatTimeoutSecs: 10,
              electionTimeoutMillis: 10000,
              catchUpTimeoutMillis: -1,
              catchUpTakeoverDelayMillis: 30000,
              getLastErrorModes: {},
              getLastErrorDefaults: {
                  w: 1,
                  wtimeout: 0
              replicaSetId: ObjectId('5a8d8aeb5e061defabdebc4d')

      The logs show the following roles are transitioned to:
      localhost:27017 - primary
      localhost:27018 - secondary
      localhost:27019 - arbiter

      The secondary fassert's with a failure later:

      2018-02-21T15:12:43.034+0000 F ROLLBACK [rsBackgroundSync] Unable to complete rollback. A full resync may be needed: UnrecoverableRollbackError: need to rollback, but unable to determine common point between local and remote oplog: NoMatchingDocument: RS100 reached beginning of remote oplog [1]

      It looks like it starts rollback on this line

      2018-02-21T15:08:44.330+0000 I REPL     [rsBackgroundSync] Starting rollback due to OplogStartMissing: Our last op time fetched: { ts: Timestamp(1519225716, 2), t: 1 }. source's GTE: { ts: Timestamp(1519225716, 3), t: 1 } hashes: (9214982500984603846/-255950376535661437)

      It isn't clear to me that this is a bug, but it also seems unlikely the C driver tests are generating so much data that the secondary's oplog rolls off. Can someone confirm that this is a server bug, or help explain what is going on here?


        1. arbiter.log
          299 kB
        2. primary.log
          811 kB
        3. secondary.log
          262 kB
        4. server_status_after_tests.txt
          24 kB



            kelsey.schubert@mongodb.com Kelsey Schubert
            kevin.albertson@mongodb.com Kevin Albertson
            0 Vote for this issue
            6 Start watching this issue