Details
-
Bug
-
Resolution: Duplicate
-
Major - P3
-
None
-
3.2.8
-
None
-
ALL
-
Description
After rebooting a system, the database which was currently Primary lost an election to the system which is now the primary. During that process, and in the seconds that followed, the system encountered an error, got stuck in "RECOVERING" and then crashed.
DAY:15:38.702-0500 I REPL [ReplicationExecutor] stepping down from primary, because a new term has begun: 21
|
DAY:15:38.703-0500 I REPL [replExecDBWorker-1] transition to SECONDARY
|
DAY:15:38.710-0500 I NETWORK [conn244] SocketException handling request, closing client connection: 9001 socket exception [SEND_ERROR] server [5.5.5.12:58244]
|
DAY:15:39.248-0500 I REPL [ReplicationExecutor] Member mongo01.domain.us:27017 is now in state PRIMARY
|
DAY:15:39.694-0500 I REPL [ReplicationExecutor] syncing from: mongo01.domain.us:27017
|
DAY:15:39.696-0500 I REPL [SyncSourceFeedback] setting syncSourceFeedback to mongo01.domain.us:27017
|
DAY:15:39.697-0500 I ASIO [NetworkInterfaceASIO-BGSync-0] Successfully connected to mongo01.domain.us:27017
|
DAY:15:39.698-0500 I REPL [rsBackgroundSync] Starting rollback due to OplogStartMissing: our last op time fetched: (term: 20, timestamp: Jul 12 22:15:38:1ce). source's GTE: (term: 21, timestamp: Jul 12 22:15:38:1ce) hashes: (7825714044400008130/-1471538608383373916)
|
DAY:15:39.698-0500 I REPL [rsBackgroundSync] beginning rollback
|
DAY:15:39.698-0500 I REPL [rsBackgroundSync] rollback 0
|
DAY:15:39.698-0500 I REPL [ReplicationExecutor] transition to ROLLBACK
|
DAY:15:39.699-0500 I REPL [rsBackgroundSync] rollback 1
|
DAY:15:39.700-0500 I REPL [rsBackgroundSync] rollback 2 FindCommonPoint
|
DAY:15:39.700-0500 I REPL [rsBackgroundSync] rollback our last optime: Jul 12 22:15:38:1ce
|
DAY:15:39.700-0500 I REPL [rsBackgroundSync] rollback their last optime: Jul 12 22:15:39:167
|
DAY:15:39.700-0500 I REPL [rsBackgroundSync] rollback diff in end of log times: -1 seconds
|
DAY:15:39.707-0500 I QUERY [conn545] assertion 13436 cannot read from oplog collection while in rollback ns:local.oplog.rs query:{}
|
DAY:15:39.707-0500 I QUERY [conn545] ntoskip:0 ntoreturn:1
|
DAY:15:39.809-0500 I QUERY [conn547] assertion 13436 cannot read from oplog collection while in rollback ns:local.oplog.rs query:{}
|
DAY:15:39.809-0500 I QUERY [conn547] ntoskip:0 ntoreturn:1
|
DAY:15:39.907-0500 I REPL [rsBackgroundSync] rollback 3 fixup
|
DAY:15:39.908-0500 I REPL [rsBackgroundSync] rollback 3.5
|
DAY:15:39.908-0500 I REPL [rsBackgroundSync] rollback 4 n:2
|
DAY:15:39.908-0500 I REPL [rsBackgroundSync] minvalid=(term: 21, timestamp: Jul 12 22:15:39:1df)
|
DAY:15:39.908-0500 I REPL [rsBackgroundSync] rollback 4.6
|
DAY:15:39.908-0500 I REPL [rsBackgroundSync] rollback 4.7
|
DAY:15:39.909-0500 I REPL [rsBackgroundSync] rollback 5 d:1 u:2
|
DAY:15:39.909-0500 I REPL [rsBackgroundSync] rollback 6
|
DAY:15:39.910-0500 I REPL [rsBackgroundSync] rollback done
|
DAY:15:39.910-0500 I REPL [rsBackgroundSync] rollback finished
|
DAY:15:39.910-0500 I NETWORK [SyncSourceFeedback] Socket say send() errno:9 Bad file descriptor 5.5.6.11:27017
|
DAY:15:39.910-0500 I REPL [SyncSourceFeedback] SyncSourceFeedback error sending update: socket exception [SEND_ERROR] for 5.5.6.11:27017
|
DAY:15:39.910-0500 I REPL [SyncSourceFeedback] updateUpstream failed: Location9001: socket exception [SEND_ERROR] for 5.5.6.11:27017, will retry
|
DAY:15:39.910-0500 I QUERY [conn548] assertion 13436 cannot read from oplog collection while in rollback ns:local.oplog.rs query:{}
|
DAY:15:39.910-0500 I QUERY [conn548] ntoskip:0 ntoreturn:1
|
DAY:15:39.920-0500 I REPL [ReplicationExecutor] transition to RECOVERING
|
DAY:15:39.920-0500 I REPL [ReplicationExecutor] syncing from: mongo03.domain.us:27017
|
DAY:15:39.921-0500 I REPL [SyncSourceFeedback] setting syncSourceFeedback to mongo03.domain.us:27017
|
DAY:15:39.922-0500 I ASIO [NetworkInterfaceASIO-BGSync-0] Successfully connected to mongo03.domain.us:27017
|
DAY:16:09.931-0500 I REPL [ReplicationExecutor] re-evaluating sync source because our current sync source's most recent OpTime is (term: 20, timestamp: Jul 12 22:15:38:1cc) which is more than 30s behind member mongo01.domain.us:27017 whose most recent OpTime is (term: 21, timestamp: Jul 12 22:16:09:a8)
|
DAY:16:09.932-0500 I REPL [ReplicationExecutor] syncing from: mongo01.domain.us:27017
|
DAY:16:09.933-0500 I REPL [SyncSourceFeedback] setting syncSourceFeedback to mongo01.domain.us:27017
|
DAY:16:09.934-0500 I REPL [rsBackgroundSync] Starting rollback due to OplogStartMissing: our last op time fetched: (term: 20, timestamp: Jul 12 22:15:38:1cc). source's GTE: (term: 21, timestamp: Jul 12 22:15:38:1cd) hashes: (1087207337565601823/-6762428419615138830)
|
DAY:16:09.935-0500 I - [rsBackgroundSync] Fatal assertion 18750 UnrecoverableRollbackError: need to rollback, but in inconsistent state. minvalid: (term: 21, timestamp: Jul 12 22:15:39:1df) > our last optime: (term: 20, timestamp: Jul 12 22:15:38:1cc)
|
DAY:16:09.935-0500 I - [rsBackgroundSync]
|
|
|
***aborting after fassert() failure
|
|
Followed by this on restart:
DAY::21:36.465-0500 I FTDC [initandlisten] Initializing full-time diagnostic data capture with directory '/mongodb/data/wt/diagnostic.data'
|
DAY::21:36.465-0500 I NETWORK [HostnameCanonicalizationWorker] Starting hostname canonicalization worker
|
DAY::21:36.467-0500 I NETWORK [initandlisten] waiting for connections on port 27017
|
DAY::21:36.470-0500 I REPL [ReplicationExecutor] New replica set config in use: { _id: "OnspringCavern01", version: 8, protocolVersion: 1, members: [ { _id: 0, host: "mongo01.domain.us:27017", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 3.0, tags: {}, slaveDelay: 0, votes: 1 }, { _id: 1, host: "mongo02.domain.us:27017", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 2.0, tags: {}, slaveDelay: 0, votes: 1 }, { _id: 2, host: "mongo03.domain.us:27017", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 1.0, tags: {}, slaveDelay: 0, votes: 1 } ], settings: { chainingAllowed: true, heartbeatIntervalMillis: 1000, heartbeatTimeoutSecs: 5, electionTimeoutMillis: 6000, getLastErrorModes: {}, getLastErrorDefaults: { w: 1, wtimeout: 0 }, replicaSetId: ObjectId('57857435bb2a73b423c7eb4c') } }
|
DAY::21:36.470-0500 I REPL [ReplicationExecutor] This node is mongo02.domain.us:27017 in the config
|
DAY::21:36.470-0500 I REPL [ReplicationExecutor] transition to STARTUP2
|
DAY::21:36.470-0500 I REPL [ReplicationExecutor] Starting replication applier threads
|
DAY::21:36.470-0500 I REPL [ReplicationExecutor] transition to RECOVERING
|
DAY::21:36.471-0500 I ASIO [NetworkInterfaceASIO-Replication-0] Successfully connected to mongo01.domain.us:27017
|
DAY::21:36.472-0500 I ASIO [NetworkInterfaceASIO-Replication-0] Successfully connected to mongo03.domain.us:27017
|
DAY::21:36.476-0500 I REPL [ReplicationExecutor] New replica set config in use: { _id: "OnspringCavern01", version: 9, protocolVersion: 1, members: [ { _id: 0, host: "mongo01.domain.us:27017", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 3.0, tags: {}, slaveDelay: 0, votes: 1 }, { _id: 1, host: "mongo02.domain.us:27017", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 2.0, tags: {}, slaveDelay: 0, votes: 1 }, { _id: 2, host: "mongo03.domain.us:27017", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 1.0, tags: {}, slaveDelay: 0, votes: 1 } ], settings: { chainingAllowed: false, heartbeatIntervalMillis: 1000, heartbeatTimeoutSecs: 5, electionTimeoutMillis: 6000, getLastErrorModes: {}, getLastErrorDefaults: { w: 1, wtimeout: 0 }, replicaSetId: ObjectId('57857435bb2a73b423c7eb4c') } }
|
DAY::21:36.476-0500 I REPL [ReplicationExecutor] This node is mongo02.domain.us:27017 in the config
|
DAY::21:36.477-0500 I REPL [ReplicationExecutor] Member mongo03.domain.us:27017 is now in state SECONDARY
|
DAY::21:36.477-0500 I REPL [ReplicationExecutor] Member mongo01.domain.us:27017 is now in state PRIMARY
|
DAY::21:37.002-0500 I FTDC [ftdc] Unclean full-time diagnostic data capture shutdown detected, found interim file, some metrics may have been lost. OK
|
DAY::21:37.471-0500 I REPL [ReplicationExecutor] syncing from primary: mongo01.domain.us:27017
|
DAY::21:37.475-0500 I REPL [SyncSourceFeedback] setting syncSourceFeedback to mongo01.domain.us:27017
|
DAY::21:37.476-0500 I ASIO [NetworkInterfaceASIO-BGSync-0] Successfully connected to mongo01.domain.us:27017
|
DAY::21:37.477-0500 I REPL [rsBackgroundSync] Starting rollback due to OplogStartMissing: our last op time fetched: (term: 20, timestamp: Jul 12 22:15:38:1cc). source's GTE: (term: 21, timestamp: Jul 12 22:15:38:1cd) hashes: (1087207337565601823/-6762428419615138830)
|
DAY::21:37.477-0500 I - [rsBackgroundSync] Fatal assertion 18750 UnrecoverableRollbackError: need to rollback, but in inconsistent state. minvalid: (term: 21, timestamp: Jul 12 22:15:39:1df) > our last optime: (term: 20, timestamp: Jul 12 22:15:38:1cc)
|
DAY::21:37.477-0500 I - [rsBackgroundSync]
|
|
|
***aborting after fassert() failure
|
Attachments
Issue Links
- duplicates
-
SERVER-25145 During rollback (or w/minvalid invalid) select sync source based on end OpTime
-
- Closed
-
-
SERVER-7200 use oplog as op buffer on secondaries
-
- Closed
-
- related to
-
SERVER-25848 Enforce single sync source during rollback
-
- Closed
-