Loading...

XML

Word

Printable

JSON

Type: Question
Resolution: Incomplete
Priority: Major - P3
Fix Version/s: None
Affects Version/s: 3.7.2
Component/s: Replication
Labels:
None

Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

Tests for the C driver had a failure on PowerPC which look like a mongod failure. I haven't yet been able to reproduce. Looking at the logs we get a secondary unable to rollback.

The replica set is initiated with following config:

{
    _id: "repl0",
    version: 1,
    protocolVersion: 1,
    members: [{
        _id: 0,
        host: "localhost:27017",
        arbiterOnly: false,
        buildIndexes: true,
        hidden: false,
        priority: 1.0,
        tags: {
            ordinal: "one",
            dc: "ny"
        },
        slaveDelay: 0,
        votes: 1
    }, {
        _id: 1,
        host: "localhost:27018",
        arbiterOnly: false,
        buildIndexes: true,
        hidden: false,
        priority: 1.0,
        tags: {
            ordinal: "two",
            dc: "pa"
        },
        slaveDelay: 0,
        votes: 1
    }, {
        _id: 2,
        host: "localhost:27019",
        arbiterOnly: true,
        buildIndexes: true,
        hidden: false,
        priority: 0.0,
        tags: {},
        slaveDelay: 0,
        votes: 1
    }],
    settings: {
        chainingAllowed: true,
        heartbeatIntervalMillis: 2000,
        heartbeatTimeoutSecs: 10,
        electionTimeoutMillis: 10000,
        catchUpTimeoutMillis: -1,
        catchUpTakeoverDelayMillis: 30000,
        getLastErrorModes: {},
        getLastErrorDefaults: {
            w: 1,
            wtimeout: 0
        },
        replicaSetId: ObjectId('5a8d8aeb5e061defabdebc4d')
    }
}

The logs show the following roles are transitioned to:
localhost:27017 - primary
localhost:27018 - secondary
localhost:27019 - arbiter

The secondary fassert's with a failure later:

2018-02-21T15:12:43.034+0000 F ROLLBACK [rsBackgroundSync] Unable to complete rollback. A full resync may be needed: UnrecoverableRollbackError: need to rollback, but unable to determine common point between local and remote oplog: NoMatchingDocument: RS100 reached beginning of remote oplog [1]

It looks like it starts rollback on this line

2018-02-21T15:08:44.330+0000 I REPL     [rsBackgroundSync] Starting rollback due to OplogStartMissing: Our last op time fetched: { ts: Timestamp(1519225716, 2), t: 1 }. source's GTE: { ts: Timestamp(1519225716, 3), t: 1 } hashes: (9214982500984603846/-255950376535661437)

It isn't clear to me that this is a bug, but it also seems unlikely the C driver tests are generating so much data that the secondary's oplog rolls off. Can someone confirm that this is a server bug, or help explain what is going on here?

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

arbiter.log
299 kB
Feb 22 2018 04:28:45 PM UTC
primary.log
811 kB
Feb 22 2018 04:28:45 PM UTC
secondary.log
262 kB
Feb 22 2018 04:28:45 PM UTC
server_status_after_tests.txt
24 kB
Feb 22 2018 07:09:19 PM UTC

Assignee:: Kelsey Schubert
Reporter:: Kevin Albertson
Participants:: Kelsey Schubert, Kevin Albertson, Will Schultz
Votes:: 0 Vote for this issue
Watchers:: 6 Start watching this issue

Created:: Feb 22 2018 04:31:49 PM UTC
Updated:: Apr 02 2018 10:00:34 PM UTC
Resolved:: Mar 10 2018 02:39:39 PM UTC

Details

Description

Attachments

Attachments

Activity

People

Dates