Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-17881

Primary crashing with 'Didn't find RecordId in WiredTigerRecordStore' while tailing capped collection

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Incomplete
    • Affects Version/s: 3.0.1
    • Fix Version/s: None
    • Component/s: Replication, WiredTiger
    • Labels:
      None
    • Operating System:
      ALL
    • Steps To Reproduce:
      Hide

      I haven't yet been able to reproduce this in isolation, this usually appears under write load of ~500 updates/sec, ~10 inserts/sec. Simply using the above mubsub library in isolation hasn't worked. Will add more info as I have it, do let me know if you have any suggestions for how to isolate this.

      Show
      I haven't yet been able to reproduce this in isolation, this usually appears under write load of ~500 updates/sec, ~10 inserts/sec. Simply using the above mubsub library in isolation hasn't worked. Will add more info as I have it, do let me know if you have any suggestions for how to isolate this.

      Description

      I've upgraded my UAT environment to Mongo 3.0.1 with the WiredTiger storage engine & am repeatedly seeing my primary replicaset fallback to RECOVERY mode with the below stacktrace. The "prism-pubsub.intergate" collection mentioned in the stack trace is a capped collection which we're opening a tailable cursor on using the (just forked) mubsub library here: https://github.com/evanlh/mubsub.

      2015-04-02T15:52:21.286-0400 I CONTROL  [rsBackgroundSync] mongod.exe    ...\src\mongo\util\stacktrace_win.cpp(175)                            mongo::printStackTrace+0x43
      2015-04-02T15:52:21.287-0400 I CONTROL  [rsBackgroundSync] mongod.exe    ...\src\mongo\util\log.cpp(135)                                       mongo::logContext+0x97
      2015-04-02T15:52:21.287-0400 I CONTROL  [rsBackgroundSync] mongod.exe    ...\src\mongo\util\assert_util.cpp(214)                               mongo::msgasserted+0xd7
      2015-04-02T15:52:21.287-0400 I CONTROL  [rsBackgroundSync] mongod.exe    ...\src\mongo\db\storage\wiredtiger\wiredtiger_record_store.cpp(330)  mongo::WiredTigerRecordStore::dataFor+0xc7
      2015-04-02T15:52:21.287-0400 I CONTROL  [rsBackgroundSync] mongod.exe    ...\src\mongo\db\catalog\collection.cpp(157)                          mongo::Collection::docFor+0x52
      2015-04-02T15:52:21.287-0400 I CONTROL  [rsBackgroundSync] mongod.exe    ...\src\mongo\db\exec\idhack.cpp(153)                                 mongo::IDHackStage::work+0x176
      2015-04-02T15:52:21.287-0400 I CONTROL  [rsBackgroundSync] mongod.exe    ...\src\mongo\db\query\plan_executor.cpp(328)                         mongo::PlanExecutor::getNextSnapshotted+0xdd
      2015-04-02T15:52:21.287-0400 I CONTROL  [rsBackgroundSync] mongod.exe    ...\src\mongo\db\query\plan_executor.cpp(286)                         mongo::PlanExecutor::getNext+0x41
      2015-04-02T15:52:21.287-0400 I CONTROL  [rsBackgroundSync] mongod.exe    ...\src\mongo\db\dbhelpers.cpp(151)                                   mongo::Helpers::findOne+0x227
      2015-04-02T15:52:21.287-0400 I CONTROL  [rsBackgroundSync] mongod.exe    ...\src\mongo\db\dbhelpers.cpp(116)                                   mongo::Helpers::findOne+0x37
      2015-04-02T15:52:21.287-0400 I CONTROL  [rsBackgroundSync] mongod.exe    ...\src\mongo\db\repl\rs_rollback.cpp(583)                            mongo::repl::`anonymous namespace'::syncFixUp+0x18e7
      2015-04-02T15:52:21.288-0400 I CONTROL  [rsBackgroundSync] mongod.exe    ...\src\mongo\db\repl\rs_rollback.cpp(774)                            mongo::repl::`anonymous namespace'::_syncRollback+0x395
      2015-04-02T15:52:21.288-0400 I CONTROL  [rsBackgroundSync] mongod.exe    ...\src\mongo\db\repl\rs_rollback.cpp(827)                            mongo::repl::syncRollback+0x1f5
      2015-04-02T15:52:21.288-0400 I CONTROL  [rsBackgroundSync] mongod.exe    ...\src\mongo\db\repl\bgsync.cpp(428)                                 mongo::repl::BackgroundSync::_rollbackIfNeeded+0x590
      2015-04-02T15:52:21.288-0400 I CONTROL  [rsBackgroundSync] mongod.exe    ...\src\mongo\db\repl\bgsync.cpp(262)                                 mongo::repl::BackgroundSync::produce+0x406
      2015-04-02T15:52:21.288-0400 I CONTROL  [rsBackgroundSync] mongod.exe    ...\src\mongo\db\repl\bgsync.cpp(200)                                 mongo::repl::BackgroundSync::_producerThread+0x9a
      2015-04-02T15:52:21.288-0400 I CONTROL  [rsBackgroundSync] mongod.exe    ...\src\mongo\db\repl\bgsync.cpp(153)                                 mongo::repl::BackgroundSync::producerThread+0x86
      2015-04-02T15:52:21.288-0400 I CONTROL  [rsBackgroundSync] mongod.exe    ...\src\third_party\boost\libs\thread\src\win32\thread.cpp(185)       boost::`anonymous namespace'::thread_start_function+0x21
      2015-04-02T15:52:21.288-0400 I CONTROL  [rsBackgroundSync] MSVCR120.dll                                                                        beginthreadex+0x107
      2015-04-02T15:52:21.288-0400 I CONTROL  [rsBackgroundSync] MSVCR120.dll                                                                        endthreadex+0x192
      2015-04-02T15:52:21.289-0400 I CONTROL  [rsBackgroundSync] kernel32.dll                                                                        BaseThreadInitThunk+0xd
      2015-04-02T15:52:21.289-0400 I CONTROL  [rsBackgroundSync]
      2015-04-02T15:52:21.486-0400 I REPL     [rsBackgroundSync] replSet exception in rollback ns:prism-pubsub.intergate { _id: ObjectId('551d9db163449994281a07b4') } 28556 Didn't find RecordId in WiredTigerRecordStore ndeletes:9
      2015-04-02T15:52:21.486-0400 I REPL     [rsBackgroundSync] replSet 9 delete and 20 update operations processed out of 40 total operations
      2015-04-02T15:52:21.765-0400 I REPL     [ReplicationExecutor] replSetElect voting yea for algo-rh01-sdny.itgssi.com:27017 (1)
      2015-04-02T15:52:22.379-0400 I REPL     [ReplicationExecutor] Member algo-rh01-sdny.itgssi.com:27017 is now in state PRIMARY
      2015-04-02T15:52:22.934-0400 I REPL     [rsBackgroundSync] rollback 5 d:9 u:30
      2015-04-02T15:52:22.934-0400 I REPL     [rsBackgroundSync] rollback 6
      2015-04-02T15:52:23.075-0400 W REPL     [rsBackgroundSync] issues during syncRollback, see log
      2015-04-02T15:52:23.075-0400 I REPL     [ReplicationExecutor] transition to RECOVERING
      2015-04-02T15:52:23.076-0400 I REPL     [rsBackgroundSync] rollback finished
      2015-04-02T15:52:23.091-0400 I REPL     [ReplicationExecutor] syncing from: algo-rh01-sdny.itgssi.com:27017
      

        Attachments

          Issue Links

            Activity

              People

              • Votes:
                1 Vote for this issue
                Watchers:
                11 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: