The secondary of a shard aborting after fassert() failure

XMLWordPrintableJSON

    • Type: Question
    • Resolution: Done
    • Priority: Major - P3
    • None
    • Affects Version/s: 3.4.9
    • Component/s: Replication
    • None
    • None
    • 3
    • None
    • None
    • None
    • None
    • None
    • None

      Hello everyone.

      I installed mongodb3.4.9 on my computer with centos7.

      I configured mongodb with one shard which had a primary, a secondary and a arbiter, and then inseted data into mongodb through mongos with 10 threads. But the secondary crashed(I tried many times). The error infomation in the log file  is shown below.

       

      2018-12-28T10:01:23.530+0800 I REPL     [replication-4] Restarting oplog query due to error: OperationFailed: GetMore command executor error: CappedPositionLost: CollectionScan died due to failure to restore tailable cursor posit 2018-12-28T10:01:23.530+0800 I REPL     [replication-4] Restarting oplog query due to error: OperationFailed: GetMore command executor error: CappedPositionLost: CollectionScan died due to failure to restore tailable cursor posit         ion. Last seen record id: RecordId(6639858301032998504). Last fetched optime (with hash): \{ ts: Timestamp 1545962482000|9832, t: 1 }[-5405713631010586723]. Restarts remaining: 3     124 2018-12-28T10:01:23.531+0800 I REPL     [replication-4] Scheduled new oplog query Fetcher source: localhost:21001 database: local query: \{ find: "oplog.rs", filter: { ts: { $gte: Timestamp 1545962482000|9832 } }, tailable: true,         oplogReplay: true, awaitData: true, maxTimeMS: 60000, term: 1 } query metadata: \{ $replData: 1, $oplogQueryData: 1, $ssm: { $secondaryOk: true } } active: 1 timeout: 65000ms shutting down?: 0 first: 1 firstCommandScheduler: Remot         eCommandRetryScheduler request: RemoteCommand 175756 -- target:localhost:21001 db:local cmd:\{ find: "oplog.rs", filter: { ts: { $gte: Timestamp 1545962482000|9832 } }, tailable: true, oplogReplay: true, awaitData: true, maxTimeMS         : 60000, term: 1 } active: 1 callbackHandle.valid: 1 callbackHandle.cancelled: 0 attempt: 1 retryPolicy: RetryPolicyImpl maxAttempts: 1 maxTimeMillis: -1ms     125 2018-12-28T10:01:23.872+0800 I REPL     [rsBackgroundSync] Starting rollback due to OplogStartMissing: Received an empty batch from sync source.     126 2018-12-28T10:01:23.872+0800 I REPL     [rsBackgroundSync] rollback 0     127 2018-12-28T10:01:23.872+0800 I REPL     [rsBackgroundSync] transition to ROLLBACK     128 2018-12-28T10:01:23.872+0800 I NETWORK  [rsBackgroundSync] legacy transport layer closing all connections     129 2018-12-28T10:01:23.872+0800 I NETWORK  [rsBackgroundSync] Skip closing connection for connection # 5     130 2018-12-28T10:01:23.872+0800 I NETWORK  [rsBackgroundSync] Skip closing connection for connection # 4     131 2018-12-28T10:01:23.873+0800 I REPL     [rsBackgroundSync] beginning rollback     132 2018-12-28T10:01:23.873+0800 I -        [conn9] end connection 127.0.0.1:58724 (6 connections now open)     133 2018-12-28T10:01:23.873+0800 I -        [conn8] end connection 127.0.0.1:58722 (6 connections now open)     134 2018-12-28T10:01:23.873+0800 I -        [conn7] end connection 127.0.0.1:58125 (6 connections now open)     135 2018-12-28T10:01:23.873+0800 I REPL     [rsBackgroundSync] rollback 1     136 2018-12-28T10:01:23.873+0800 I -        [conn6] end connection 127.0.0.1:58110 (6 connections now open)     137 2018-12-28T10:01:23.874+0800 I REPL     [rsBackgroundSync] rollback 2 FindCommonPoint     138 2018-12-28T10:01:23.874+0800 I REPL     [rsBackgroundSync] rollback our last optime:   Dec 28 10:01:22:2668     139 2018-12-28T10:01:23.874+0800 I REPL     [rsBackgroundSync] rollback their last optime: Dec 28 10:01:23:9de8     140 2018-12-28T10:01:23.874+0800 I REPL     [rsBackgroundSync] rollback diff in end of log times: -1 seconds     141 2018-12-28T10:01:23.961+0800 F REPL     [rsBackgroundSync] rollback error RS100 reached beginning of remote oplog     142 2018-12-28T10:01:23.961+0800 I REPL     [rsBackgroundSync]   them:      localhost:21001 (127.0.0.1)     143 2018-12-28T10:01:23.961+0800 I REPL     [rsBackgroundSync]   theirTime: Dec 28 10:01:22 5c2583f2:6201     144 2018-12-28T10:01:23.961+0800 I REPL     [rsBackgroundSync] rollback finished     145 2018-12-28T10:01:23.961+0800 F REPL     [rsBackgroundSync] Unable to complete rollback. A full resync may be needed: UnrecoverableRollbackError: need to rollback, but unable to determine common point between local and remote oplo         g: NoMatchingDocument: RS100 reached beginning of remote oplog [1] @ 18752     146 2018-12-28T10:01:23.961+0800 I -        [rsBackgroundSync] Fatal Assertion 28723 at src/mongo/db/repl/rs_rollback.cpp 915     147 2018-12-28T10:01:23.961+0800 I -        [rsBackgroundSync]     148     149 ***aborting after fassert() failure
      

            Assignee:
            Danny Hatcher (Inactive)
            Reporter:
            chen hui
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: