Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-10905

Duplicate key error killed all secondaries on cluster

    XMLWordPrintableJSON

Details

    • Icon: Bug Bug
    • Resolution: Cannot Reproduce
    • Icon: Critical - P2 Critical - P2
    • None
    • 2.4.6
    • Replication
    • None
    • ubuntu 12.04 on aws
    • Linux

    Description

      We have a cluster of 3 nodes to enable replication. Two of the nodes are configured with the same priority (1) while the third one is configured with priority 0 so it never gets promoted (it's the one we use for backups).

      As of last night, the two secondaries have been failing continuously during writes, with the following error logged on both servers:

      Wed Sep 25 05:28:16.211 [repl writer worker 2] ERROR: writer worker caught exception: E11000 duplicate key error index: marketshare.Application.$name_1 dup key: { : "TestingRDS" } on: { ts: Timestamp 1380112101000|1, h: -5880003146554201345, v: 2, op: "u", ns: "marketshare.Application", o2:

      { _id: ObjectId('5242be375f6ffb0c37145385') }

      , o: { $set: { boxes.0:

      { box_type_name: "OPTIMIZERMS-APPV4", updated: "2013-09-25 12:28:22.855168", <rest of object>...}

      }}

      Wed Sep 25 05:28:16.211 [repl writer worker 2] Fatal Assertion 16360
      0xdddd81 0xd9dc13 0xc26bfc 0xdab721 0xe26609 0x7ff9f4923e9a 0x7ff9f3c36ccd
      /usr/bin/mongod(_ZN5mongo15printStackTraceERSo+0x21) [0xdddd81]
      /usr/bin/mongod(_ZN5mongo13fassertFailedEi+0xa3) [0xd9dc13]
      /usr/bin/mongod(_ZN5mongo7replset14multiSyncApplyERKSt6vectorINS_7BSONObjESaIS2_EEPNS0_8SyncTailE+0x12c) [0xc26bfc]
      /usr/bin/mongod(_ZN5mongo10threadpool6Worker4loopEv+0x281) [0xdab721]
      /usr/bin/mongod() [0xe26609]
      /lib/x86_64-linux-gnu/libpthread.so.0(+0x7e9a) [0x7ff9f4923e9a]
      /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7ff9f3c36ccd]
      Wed Sep 25 05:28:16.215 [repl writer worker 2]

      ***aborting after fassert() failure

      Wed Sep 25 05:28:16.215 Got signal: 6 (Aborted).

      Wed Sep 25 05:28:16.219 Backtrace:
      0xdddd81 0x6d0d29 0x7ff9f3b794a0 0x7ff9f3b79425 0x7ff9f3b7cb8b 0xd9dc4e 0xc26bfc 0xdab721 0xe26609 0x7ff9f4923e9a 0x7ff9f3c36ccd
      /usr/bin/mongod(_ZN5mongo15printStackTraceERSo+0x21) [0xdddd81]
      /usr/bin/mongod(_ZN5mongo10abruptQuitEi+0x399) [0x6d0d29]
      /lib/x86_64-linux-gnu/libc.so.6(+0x364a0) [0x7ff9f3b794a0]
      /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x35) [0x7ff9f3b79425]
      /lib/x86_64-linux-gnu/libc.so.6(abort+0x17b) [0x7ff9f3b7cb8b]
      /usr/bin/mongod(_ZN5mongo13fassertFailedEi+0xde) [0xd9dc4e]
      /usr/bin/mongod(_ZN5mongo7replset14multiSyncApplyERKSt6vectorINS_7BSONObjESaIS2_EEPNS0_8SyncTailE+0x12c) [0xc26bfc]
      /usr/bin/mongod(_ZN5mongo10threadpool6Worker4loopEv+0x281) [0xdab721]
      /usr/bin/mongod() [0xe26609]
      /lib/x86_64-linux-gnu/libpthread.so.0(+0x7e9a) [0x7ff9f4923e9a]
      /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7ff9f3c36ccd]

      After rebooting both secondaries, the cluster was able to establish quorum again for about 15 minutes, but after that, the issue reproed again. When we logged in to the instances directly, we noticed that que indexes were correct on the primary, but were completely empty on the secondary.

      Attachments

        1. mongodb-primary.log
          24.93 MB
        2. mongodb-secondary-1.log
          24.90 MB
        3. mongodb-secondary-2.log
          24.71 MB

        Activity

          People

            samantha.ritter@mongodb.com Samantha Ritter (Inactive)
            raberrel Ramiro Berrelleza
            Votes:
            1 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: