Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-33446

PowerPC rollback failure

    XMLWordPrintable

    Details

    • Type: Question
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Incomplete
    • Affects Version/s: 3.7.2
    • Fix Version/s: None
    • Component/s: Replication
    • Labels:
      None

      Description

      Tests for the C driver had a failure on PowerPC which look like a mongod failure. I haven't yet been able to reproduce. Looking at the logs we get a secondary unable to rollback.

      The replica set is initiated with following config:

      {
          _id: "repl0",
          version: 1,
          protocolVersion: 1,
          members: [{
              _id: 0,
              host: "localhost:27017",
              arbiterOnly: false,
              buildIndexes: true,
              hidden: false,
              priority: 1.0,
              tags: {
                  ordinal: "one",
                  dc: "ny"
              },
              slaveDelay: 0,
              votes: 1
          }, {
              _id: 1,
              host: "localhost:27018",
              arbiterOnly: false,
              buildIndexes: true,
              hidden: false,
              priority: 1.0,
              tags: {
                  ordinal: "two",
                  dc: "pa"
              },
              slaveDelay: 0,
              votes: 1
          }, {
              _id: 2,
              host: "localhost:27019",
              arbiterOnly: true,
              buildIndexes: true,
              hidden: false,
              priority: 0.0,
              tags: {},
              slaveDelay: 0,
              votes: 1
          }],
          settings: {
              chainingAllowed: true,
              heartbeatIntervalMillis: 2000,
              heartbeatTimeoutSecs: 10,
              electionTimeoutMillis: 10000,
              catchUpTimeoutMillis: -1,
              catchUpTakeoverDelayMillis: 30000,
              getLastErrorModes: {},
              getLastErrorDefaults: {
                  w: 1,
                  wtimeout: 0
              },
              replicaSetId: ObjectId('5a8d8aeb5e061defabdebc4d')
          }
      }
      

      The logs show the following roles are transitioned to:
      localhost:27017 - primary
      localhost:27018 - secondary
      localhost:27019 - arbiter

      The secondary fassert's with a failure later:

      2018-02-21T15:12:43.034+0000 F ROLLBACK [rsBackgroundSync] Unable to complete rollback. A full resync may be needed: UnrecoverableRollbackError: need to rollback, but unable to determine common point between local and remote oplog: NoMatchingDocument: RS100 reached beginning of remote oplog [1]
      

      It looks like it starts rollback on this line

      2018-02-21T15:08:44.330+0000 I REPL     [rsBackgroundSync] Starting rollback due to OplogStartMissing: Our last op time fetched: { ts: Timestamp(1519225716, 2), t: 1 }. source's GTE: { ts: Timestamp(1519225716, 3), t: 1 } hashes: (9214982500984603846/-255950376535661437)
      

      It isn't clear to me that this is a bug, but it also seems unlikely the C driver tests are generating so much data that the secondary's oplog rolls off. Can someone confirm that this is a server bug, or help explain what is going on here?

        Attachments

        1. arbiter.log
          299 kB
        2. primary.log
          811 kB
        3. secondary.log
          262 kB
        4. server_status_after_tests.txt
          24 kB

          Activity

            People

            Assignee:
            kelsey.schubert Kelsey T Schubert
            Reporter:
            kevin.albertson Kevin Albertson
            Participants:
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Dates

              Created:
              Updated:
              Resolved: