Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-9870

Rollback should not enforce unique indexes while applying ops

    • Type: Icon: Bug Bug
    • Resolution: Done
    • Priority: Icon: Major - P3 Major - P3
    • 3.0.0
    • Affects Version/s: 2.2.3, 2.4.4, 2.5.0
    • Component/s: Replication
    • None
    • Environment:
      Ubuntu Precise 12.04 LTS
    • Replication
    • Fully Compatible
    • ALL

      We recently had an event where our primary server had to step down and after the new election occurred was put into a rollback state. During the rollback mongo encountered a unique index violation while attempting to write to the view_stats collection and promptly crashed.

      Mon Jun  3 18:50:32 [repl writer worker 2] ERROR: writer worker caught exception: E11000 duplicate key error index: songza.view_stats.$view_1_host_1_pid_1  dup key: { : null, : null, : null } on: { ts: Timestamp 1370285310000|4, h: 1593562783886765644, v: 2, op: "u", ns: "songza.view_stats", o2: { _id: ObjectId('51ace4772a5a3f13accdbe17') }, o: { $set: { count: 2 }, $set: { sum_ms: 29 }, $set: { sum_square_ms: 565 } } }
      Mon Jun  3 18:50:32 [repl writer worker 2]   Fatal Assertion 16360
      0xb07561 0xacc8b3 0x9abaf6 0xadab5d 0xb4d3d9 0x7f845177be9a 0x7f8450a8ecbd 
      /usr/bin/mongod(_ZN5mongo15printStackTraceERSo+0x21) [0xb07561]
      /usr/bin/mongod(_ZN5mongo13fassertFailedEi+0xa3) [0xacc8b3]
      /usr/bin/mongod(_ZN5mongo7replset14multiSyncApplyERKSt6vectorINS_7BSONObjESaIS2_EEPNS0_8SyncTailE+0x156) [0x9abaf6]
      /usr/bin/mongod(_ZN5mongo10threadpool6Worker4loopEv+0x26d) [0xadab5d]
      /usr/bin/mongod() [0xb4d3d9]
      /lib/x86_64-linux-gnu/libpthread.so.0(+0x7e9a) [0x7f845177be9a]
      /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f8450a8ecbd]
      Mon Jun  3 18:50:32 [repl writer worker 2] 
      ***aborting after fassert() failure
      Mon Jun  3 18:50:32 Got signal: 6 (Aborted).
      Mon Jun  3 18:50:32 Backtrace:
      0xb07561 0x5598c9 0x7f84509d14a0 0x7f84509d1425 0x7f84509d4b8b 0xacc8ee 0x9abaf6 0xadab5d 0xb4d3d9 0x7f845177be9a 0x7f8450a8ecbd 
      /usr/bin/mongod(_ZN5mongo15printStackTraceERSo+0x21) [0xb07561]
      /usr/bin/mongod(_ZN5mongo10abruptQuitEi+0x399) [0x5598c9]
      /lib/x86_64-linux-gnu/libc.so.6(+0x364a0) [0x7f84509d14a0]
      /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x35) [0x7f84509d1425]
      /lib/x86_64-linux-gnu/libc.so.6(abort+0x17b) [0x7f84509d4b8b]
      /usr/bin/mongod(_ZN5mongo13fassertFailedEi+0xde) [0xacc8ee]
      /usr/bin/mongod(_ZN5mongo7replset14multiSyncApplyERKSt6vectorINS_7BSONObjESaIS2_EEPNS0_8SyncTailE+0x156) [0x9abaf6]
      /usr/bin/mongod(_ZN5mongo10threadpool6Worker4loopEv+0x26d) [0xadab5d]
      /usr/bin/mongod() [0xb4d3d9]
      /lib/x86_64-linux-gnu/libpthread.so.0(+0x7e9a) [0x7f845177be9a]
      /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f8450a8ecbd]
      

      I brought the machine back up out of the replicaset and was able to verify that there were 4 bad documents that lacked the host, view and pid fields. No other active hosts had these bad documents and it's technically highly unlikely that any of our code inserted them in the first place. I removed the bad documents and brought the member back into the replicaset, but it crashed again shortly thereafter with the same error. We were only able to resolve the issue by restoring the database from an EBS snapshot and letting it catch up again.

      I've attached a log from the primary to this ticket.

        1. oplog.rs.bson
          21 kB
          Michael Henson
        2. mongodb-2013-06-03.db-event4.abridged.log
          2.45 MB
          Michael Henson

            Assignee:
            backlog-server-repl [DO NOT USE] Backlog - Replication Team
            Reporter:
            michael@songza.com Michael Henson
            Votes:
            3 Vote for this issue
            Watchers:
            8 Start watching this issue

              Created:
              Updated:
              Resolved: