Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-17689

Fatal assertion during replication, and/or initial sync

    • Type: Icon: Bug Bug
    • Resolution: Done
    • Priority: Icon: Critical - P2 Critical - P2
    • 3.0.5, 3.1.2
    • Affects Version/s: 3.0.1
    • Component/s: Replication, WiredTiger
    • Labels:
      None

      Issue Status as of Jul 14, 2015

      ISSUE SUMMARY
      During replication and/or initial sync, when using the WiredTiger storage engine, a replica set member may terminate with a fatal assertion about a WriteConflictException. This assertion shuts down the server, causing replication or initial sync to fail.

      This error is not common, but may be dependent on the workload of the application or user data.

      USER IMPACT
      A replica set may terminate during replication and/or initial sync with a fatal assertion, which requires the member to be restarted. In certain low availability configurations, this issue may affect the ability for the replica set to maintain a Primary member to take writes.

      WORKAROUNDS
      N/A

      AFFECTED VERSIONS
      MongoDB 3.0.0 through 3.0.4.

      FIX VERSION
      The fix is included in the 3.0.5 production release.

      RESOLUTION DETAILS
      By handling WriteConflictExceptions during the applyOps stage of replication, including during initial sync the system will now be able to retry until the WiredTiger operation completes successfully.

      Original description

      After successful replication the collections and indexes from MMAPv1 to Wiredtiger storage engine on our replicaset containing 2 servers the server crashes with the following output:

      2015-03-23T12:49:36.500+0000 I INDEX    [rsSync] build index done.  scanned 31 total records. 0 secs
      2015-03-23T12:49:36.502+0000 I REPL     [rsSync] initial sync data copy, starting syncup
      2015-03-23T12:49:36.527+0000 I REPL     [rsSync] oplog sync 1 of 3
      2015-03-23T12:49:37.106+0000 I REPL     [ReplicationExecutor] syncing from: primary:27017
      2015-03-23T12:49:37.111+0000 I REPL     [SyncSourceFeedback] replset setting syncSourceFeedback to primary:27017
      2015-03-23T12:50:10.804+0000 I REPL     [repl writer worker 14] replication update of non-mod failed: { ts: Timestamp 1427114721000|133, h: -8520449917638273792, v: 2, op: "u", ns: "...REMOVED...", o2: { ...REMOVED... } }
      2015-03-23T12:50:10.807+0000 I REPL     [repl writer worker 14] replication info adding missing object
      2015-03-23T12:50:10.882+0000 E REPL     [repl writer worker 14] writer worker caught exception:  :: caused by :: 112 WriteConflict on: { ts: Timestamp 1427114721000|133, h: -8520449917638273792, v: 2, op: "u", ns: "...REMOVED...", o2: { ...REMOVED... }
      2015-03-23T12:50:10.883+0000 I -        [repl writer worker 14] Fatal Assertion 16361
      2015-03-23T12:50:10.883+0000 I -        [repl writer worker 14] 
      
      ***aborting after fassert() failure
      
      
      

            Assignee:
            scotthernandez Scott Hernandez (Inactive)
            Reporter:
            sjoerdmulder Sjoerd Mulder
            Votes:
            1 Vote for this issue
            Watchers:
            10 Start watching this issue

              Created:
              Updated:
              Resolved: