RECOVERING nodes may insert documents that violate unique indexes

XMLWordPrintableJSON

    • Type: Bug
    • Resolution: Done
    • Priority: Major - P3
    • None
    • Affects Version/s: 2.4.8
    • None
    • ALL
    • None
    • 3
    • None
    • None
    • None
    • None
    • None
    • None

      As a result of SERVER-12406 (or some other way) a secondary has a document that a primary does not have. Now an insert that's legal on the primary may violate a unique constraint on the secondary.

      Observed behavior is that this insert causes the secondary to crash:

      Fri Jan 17 19:32:00.760 [repl prefetch worker] index prefetch for op i
      Fri Jan 17 19:32:00.760 [rsSync] replication batch size is 1
      Fri Jan 17 19:32:00.760 [repl writer worker 1] applying op: { ts: Timestamp 1390015920000|1, h: 3574278741816169728, v: 2, op: "i", ns: "cs.csku", o: { _id: { r: ObjectId('5121d0ad23957311532da2f1'), m: "Double" }, p: ObjectId('5152834b77232a55b1c298b6'), v: ObjectId('5152836577232a55b1c298b9') } }
      Fri Jan 17 19:32:00.760 [repl writer worker 1] User Assertion: 11000:E11000 duplicate key error index: cs.csku.$v_1_p_1  dup key: { : ObjectId('5152836577232a55b1c298b9'), : ObjectId('5152834b77232a55b1c298b6') }
      Fri Jan 17 19:32:00.760 [repl writer worker 1] ERROR: writer worker caught exception: E11000 duplicate key error index: cs.csku.$v_1_p_1  dup key: { : ObjectId('5152836577232a55b1c298b9'), : ObjectId('5152834b77232a55b1c298b6') } on: { ts: Timestamp 1390015920000|1, h: 3574278741816169728, v: 2, op: "i", ns: "cs.csku", o: { _id: { r: ObjectId('5121d0ad23957311532da2f1'), m: "Double" }, p: ObjectId('5152834b77232a55b1c298b6'), v: ObjectId('5152836577232a55b1c298b9') } }
      Fri Jan 17 19:32:00.761 [repl writer worker 1]   Fatal Assertion 16360
      0x10044c60b 0x100425837 0x10033c97f 0x10042cc48 0x10047f1a5 0x7fff8e8bb899 0x7fff8e8bb72a 0x7fff8e8bffc9
       0   mongod                              0x000000010044c60b _ZN5mongo15printStackTraceERSo + 43
       1   mongod                              0x0000000100425837 _ZN5mongo13fassertFailedEi + 151
       2   mongod                              0x000000010033c97f _ZN5mongo7replset14multiSyncApplyERKSt6vectorINS_7BSONObjESaIS2_EEPNS0_8SyncTailE + 271
       3   mongod                              0x000000010042cc48 _ZN5mongo10threadpool6Worker4loopEv + 138
       4   mongod                              0x000000010047f1a5 thread_proxy + 229
       5   libsystem_pthread.dylib             0x00007fff8e8bb899 _pthread_body + 138
       6   libsystem_pthread.dylib             0x00007fff8e8bb72a _pthread_struct_init + 0
       7   libsystem_pthread.dylib             0x00007fff8e8bffc9 thread_start + 13
      Fri Jan 17 19:32:00.763 [repl writer worker 1]
      
      ***aborting after fassert() failure
      
      
      Fri Jan 17 19:32:00.763 Got signal: 6 (Abort trap: 6).
      
      Fri Jan 17 19:32:00.765 Backtrace:
      0x10044c60b 0x100001121 0x7fff936dc5aa 0x5 0x7fff8e978bba 0x100425875 0x10033c97f 0x10042cc48 0x10047f1a5 0x7fff8e8bb899 0x7fff8e8bb72a 0x7fff8e8bffc9
       0   mongod                              0x000000010044c60b _ZN5mongo15printStackTraceERSo + 43
       1   mongod                              0x0000000100001121 _ZN5mongo10abruptQuitEi + 225
       2   libsystem_platform.dylib            0x00007fff936dc5aa _sigtramp + 26
       3   ???                                 0x0000000000000005 0x0 + 5
       4   libsystem_c.dylib                   0x00007fff8e978bba abort + 125
       5   mongod                              0x0000000100425875 _ZN5mongo13fassertFailedEi + 213
       6   mongod                              0x000000010033c97f _ZN5mongo7replset14multiSyncApplyERKSt6vectorINS_7BSONObjESaIS2_EEPNS0_8SyncTailE + 271
       7   mongod                              0x000000010042cc48 _ZN5mongo10threadpool6Worker4loopEv + 138
       8   mongod                              0x000000010047f1a5 thread_proxy + 229
       9   libsystem_pthread.dylib             0x00007fff8e8bb899 _pthread_body + 138
       10  libsystem_pthread.dylib             0x00007fff8e8bb72a _pthread_struct_init + 0
       11  libsystem_pthread.dylib             0x00007fff8e8bffc9 thread_start + 13
      

      On subsequent restart, it applies the same insert without complaint:

      Fri Jan 17 19:33:51.832 [rsStart] replSet STARTUP2
      Fri Jan 17 19:33:51.832 BackgroundJob starting: rsMgr
      Fri Jan 17 19:33:51.833 [rsBackgroundSync] replset bgsync fetch queue set to: 52d9ef1f:1 -540961654815979307
      Fri Jan 17 19:33:51.833 BackgroundJob starting: rsGhostSync
      Fri Jan 17 19:33:51.833 [rsSyncNotifier] replset markOplog: 0:0 52d9ef1f:1
      Fri Jan 17 19:33:51.833 BackgroundJob starting: ConnectBG
      Fri Jan 17 19:33:51.834 [rsHealthPoll] replset info asyasmacbook.local:40001 thinks that we are down
      Fri Jan 17 19:33:51.834 [rsHealthPoll] replSet member asyasmacbook.local:40001 is up
      Fri Jan 17 19:33:51.834 [rsHealthPoll] replSet member asyasmacbook.local:40001 is now in state PRIMARY
      Fri Jan 17 19:33:52.833 [rsSync] replSet still syncing, not yet to minValid optime 52d9f5b0:1
      Fri Jan 17 19:33:52.833 [rsSyncNotifier] replset markOplog: 0:0 52d9ef1f:1
      Fri Jan 17 19:33:53.429 [initandlisten] connection accepted from 127.0.0.1:58786 #1 (1 connection now open)
      Fri Jan 17 19:33:53.429 [initandlisten] connection accepted from 127.0.0.1:58787 #2 (2 connections now open)
      Fri Jan 17 19:33:53.834 [rsSyncNotifier] replset markOplog: 0:0 52d9ef1f:1
      Fri Jan 17 19:33:54.835 [rsSyncNotifier] replset markOplog: 0:0 52d9ef1f:1
      Fri Jan 17 19:33:55.835 [rsSyncNotifier] replset markOplog: 0:0 52d9ef1f:1
      Fri Jan 17 19:33:56.836 [rsSyncNotifier] replset markOplog: 0:0 52d9ef1f:1
      Fri Jan 17 19:33:57.837 [rsSyncNotifier] replset markOplog: 0:0 52d9ef1f:1
      Fri Jan 17 19:33:58.837 [rsBackgroundSync] replSet syncing to: asyasmacbook.local:40001
      Fri Jan 17 19:33:58.838 [rsSyncNotifier] replset markOplog: 0:0 52d9ef1f:1
      Fri Jan 17 19:33:58.838 [rsSync] replSet still syncing, not yet to minValid optime 52d9f5b0:1
      Fri Jan 17 19:33:58.838 BackgroundJob starting: ConnectBG
      Fri Jan 17 19:33:58.841 [rsBackgroundSync] replSet remoteOldestOp: Jan  8 12:57:48 52cdbbcc:19
      Fri Jan 17 19:33:58.841 [rsBackgroundSync] repl: local.oplog.rs.find({ ts: { $gte: Timestamp 1390014239000|1 } })
      Fri Jan 17 19:33:58.841 [repl prefetch worker] opening db:  cs
      Fri Jan 17 19:33:58.841 [repl prefetch worker] mmf open /data/rs2/cs.ns
      Fri Jan 17 19:33:58.842 [repl prefetch worker] mmf finishOpening 0x1336e7000 /data/rs2/cs.ns len:16777216
      Fri Jan 17 19:33:58.842 [repl prefetch worker] mmf open /data/rs2/cs.0
      Fri Jan 17 19:33:58.842 [repl prefetch worker] mmf finishOpening 0x1356e7000 /data/rs2/cs.0 len:67108864
      Fri Jan 17 19:33:58.842 [repl prefetch worker] mmf open /data/rs2/cs.1
      Fri Jan 17 19:33:58.842 [repl prefetch worker] mmf finishOpening 0x13d6e7000 /data/rs2/cs.1 len:134217728
      Fri Jan 17 19:33:58.842 [repl prefetch worker] mmf close /data/rs2/cs.1
      Fri Jan 17 19:33:58.842 [repl prefetch worker] _groupCommit
      Fri Jan 17 19:33:58.844 [repl prefetch worker] groupCommit end
      Fri Jan 17 19:33:58.845 [repl prefetch worker] index prefetch for op i
      Fri Jan 17 19:33:58.845 [rsSync] replication batch size is 1
      Fri Jan 17 19:33:58.845 [repl writer worker 1] applying op: { ts: Timestamp 1390015920000|1, h: 3574278741816169728, v: 2, op: "i", ns: "cs.csku", o: { _id: { r: ObjectId('5121d0ad23957311532da2f1'), m: "Double" }, p: ObjectId('5152834b77232a55b1c298b6'), v: ObjectId('5152836577232a55b1c298b9') } }
      Fri Jan 17 19:33:59.839 [rsSyncNotifier] replset markOplog: 0:0 52d9f5b0:1
      Fri Jan 17 19:33:59.839 [rsSyncNotifier] replset setting oplog notifier to asyasmacbook.local:40001
      Fri Jan 17 19:33:59.840 BackgroundJob starting: ConnectBG
      Fri Jan 17 19:33:59.840 [rsSyncNotifier] replHandshake res not: 1 res: { ok: 1.0 }
      Fri Jan 17 19:33:59.841 [rsSyncNotifier] repl: local.oplog.rs.find({ ts: { $gte: Timestamp 1390015920000|1 } })
      Fri Jan 17 19:33:59.846 [rsSync] replSet SECONDARY
      

      Now secondary has two documents with the same value in violation of a unique constraint.

            Assignee:
            Unassigned
            Reporter:
            Asya Kamsky
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: