Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-2445

Assertion failure in replica set + durability while killing and restarting servers

    • Type: Icon: Bug Bug
    • Resolution: Cannot Reproduce
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 1.7.5
    • Component/s: Replication, Storage
    • Labels:
      None
    • Environment:
      db version v1.8.0-rc0-pre-
      git version: 9a6f2dcf91c2025f7549c875e78e4a203b26adcf
    • OS X

      I have a test case that
      1. Starts up a replica set with 2 servers and 1 arbiter, all with --dur option.
      2. Loads 10000 1K docs into a collection
      3. Forks 5 client threads, each $pushes then $pulls its own id to/from an array in all document (multi-update)
      5. A 6th thread kills a random server in the replica set every 0-60 secs then restarts it 0-30 secs later.

      After a few minutes I get the error below in one of the servers.

      You can run my testcase by downloading http://github.com/TonyGen/mongoTest-cpp and running
      > mongoTest 666 controller 127.0.0.1 mongoTest::One
      The log of each server is written to the current directory and is named mongodX.out where X is 1, 2, or 3. I usually tail -F each in a separate terminal window while it is running.

      I get an assertion error on every run on my MacBook Pro.

      Log from first replica server where error starts:
      Mon Jan 31 14:17:55 [conn6] couldn't make room for new record (len: 144) in capped ns local.oplog.rs
      Extent 0 (capExtent)
      magic: 41424344 extent->ns: local.oplog.rs
      fr: null lr: 2:3771fd0 extent->len: 192000256
      local.oplog.rs Assertion failure len * 5 > lastExtentSize db/namespace.cpp 423
      0x100037d3e 0x10003d9c6 0x100097ed8 0x10009b95d 0x10009773c 0x1002ca34a 0x100287691 0x1000914a2 0x1001f4f0f 0x1001f79cb 0x1002baef0 0x1002bdd86 0x1003c5c1d 0x1007c6254 0x7fff887cb536 0x7fff887cb3e9
      0 mongod 0x0000000100037d3e _ZN5mongo12sayDbContextEPKc + 174
      1 mongod 0x000000010003d9c6 _ZN5mongo8assertedEPKcS1_j + 342
      2 mongod 0x0000000100097ed8 _ZNK5mongo16NamespaceDetails13maybeComplainEPKci + 1736
      3 mongod 0x000000010009b95d _ZN5mongo16NamespaceDetails11cappedAllocEPKci + 2637
      4 mongod 0x000000010009773c _ZN5mongo16NamespaceDetails5allocEPKciRNS_7DiskLocE + 492
      5 mongod 0x00000001002ca34a _ZN5mongo11DataFileMgr17fast_oplog_insertEPNS_16NamespaceDetailsEPKci + 394
      6 mongod 0x0000000100287691 _ZN5mongoL8_logOpRSEPKcS1_S1_RKNS_7BSONObjEPS2_Pb + 1761
      7 mongod 0x00000001000914a2 _ZN5mongo5logOpEPKcS1_RKNS_7BSONObjEPS2_Pb + 66
      8 mongod 0x00000001001f4f0f _ZN5mongo14_updateObjectsEbPKcRKNS_7BSONObjES2_bbbRNS_7OpDebugEPNS_11RemoveSaverE + 4367
      9 mongod 0x00000001001f79cb _ZN5mongo13updateObjectsEPKcRKNS_7BSONObjES2_bbbRNS_7OpDebugE + 315
      10 mongod 0x00000001002baef0 _ZN5mongo14receivedUpdateERNS_7MessageERNS_5CurOpE + 1120
      11 mongod 0x00000001002bdd86 _ZN5mongo16assembleResponseERNS_7MessageERNS_10DbResponseERKNS_8SockAddrE + 5734
      12 mongod 0x00000001003c5c1d _ZN5mongo10connThreadEPNS_13MessagingPortE + 653
      13 libboost_thread-mt.dylib 0x00000001007c6254 thread_proxy + 132
      14 libSystem.B.dylib 0x00007fff887cb536 _pthread_start + 331
      15 libSystem.B.dylib 0x00007fff887cb3e9 thread_start + 13
      Mon Jan 31 14:17:55 [conn6] update test.col query: {} exception 0 assertion db/namespace.cpp:423 31ms
      Mon Jan 31 14:17:55 [conn7] couldn't make room for new record (len: 124) in capped ns local.oplog.rs
      Extent 0 (capExtent)
      magic: 41424344 extent->ns: local.oplog.rs
      fr: null lr: 2:3771fd0 extent->len: 192000256
      local.oplog.rs Assertion failure len * 5 > lastExtentSize db/namespace.cpp 423
      0x100037d3e 0x10003d9c6 0x100097ed8 0x10009b95d 0x10009773c 0x1002ca34a 0x100287691 0x1000914a2 0x1001f4f0f 0x1001f79cb 0x1002baef0 0x1002bdd86 0x1003c5c1d 0x1007c6254 0x7fff887cb536 0x7fff887cb3e9
      0 mongod 0x0000000100037d3e _ZN5mongo12sayDbContextEPKc + 174
      1 mongod 0x000000010003d9c6 _ZN5mongo8assertedEPKcS1_j + 342
      2 mongod 0x0000000100097ed8 _ZNK5mongo16NamespaceDetails13maybeComplainEPKci + 1736
      3 mongod 0x000000010009b95d _ZN5mongo16NamespaceDetails11cappedAllocEPKci + 2637
      4 mongod 0x000000010009773c _ZN5mongo16NamespaceDetails5allocEPKciRNS_7DiskLocE + 492
      5 mongod 0x00000001002ca34a _ZN5mongo11DataFileMgr17fast_oplog_insertEPNS_16NamespaceDetailsEPKci + 394
      6 mongod 0x0000000100287691 _ZN5mongoL8_logOpRSEPKcS1_S1_RKNS_7BSONObjEPS2_Pb + 1761
      7 mongod 0x00000001000914a2 _ZN5mongo5logOpEPKcS1_RKNS_7BSONObjEPS2_Pb + 66
      8 mongod 0x00000001001f4f0f _ZN5mongo14_updateObjectsEbPKcRKNS_7BSONObjES2_bbbRNS_7OpDebugEPNS_11RemoveSaverE + 4367
      9 mongod 0x00000001001f79cb _ZN5mongo13updateObjectsEPKcRKNS_7BSONObjES2_bbbRNS_7OpDebugE + 315
      10 mongod 0x00000001002baef0 _ZN5mongo14receivedUpdateERNS_7MessageERNS_5CurOpE + 1120
      11 mongod 0x00000001002bdd86 _ZN5mongo16assembleResponseERNS_7MessageERNS_10DbResponseERKNS_8SockAddrE + 5734
      12 mongod 0x00000001003c5c1d _ZN5mongo10connThreadEPNS_13MessagingPortE + 653
      13 libboost_thread-mt.dylib 0x00000001007c6254 thread_proxy + 132
      14 libSystem.B.dylib 0x00007fff887cb536 _pthread_start + 331
      15 libSystem.B.dylib 0x00007fff887cb3e9 thread_start + 13
      Mon Jan 31 14:17:55 [conn7] update test.col query: {} exception 0 assertion db/namespace.cpp:423 171ms
      Mon Jan 31 14:17:55 [conn10] couldn't make room for new record (len: 124) in capped ns local.oplog.rs
      Extent 0 (capExtent)
      magic: 41424344 extent->ns: local.oplog.rs
      fr: null lr: 2:3771fd0 extent->len: 192000256
      local.oplog.rs Assertion failure len * 5 > lastExtentSize db/namespace.cpp 423
      0x100037d3e 0x10003d9c6 0x100097ed8 0x10009b95d 0x10009773c 0x1002ca34a 0x100287691 0x1000914a2 0x1001f4f0f 0x1001f79cb 0x1002baef0 0x1002bdd86 0x1003c5c1d 0x1007c6254 0x7fff887cb536 0x7fff887cb3e9
      0 mongod 0x0000000100037d3e _ZN5mongo12sayDbContextEPKc + 174
      1 mongod 0x000000010003d9c6 _ZN5mongo8assertedEPKcS1_j + 342
      2 mongod 0x0000000100097ed8 _ZNK5mongo16NamespaceDetails13maybeComplainEPKci + 1736
      3 mongod 0x000000010009b95d _ZN5mongo16NamespaceDetails11cappedAllocEPKci + 2637
      4 mongod 0x000000010009773c _ZN5mongo16NamespaceDetails5allocEPKciRNS_7DiskLocE + 492
      5 mongod 0x00000001002ca34a _ZN5mongo11DataFileMgr17fast_oplog_insertEPNS_16NamespaceDetailsEPKci + 394
      6 mongod 0x0000000100287691 _ZN5mongoL8_logOpRSEPKcS1_S1_RKNS_7BSONObjEPS2_Pb + 1761
      7 mongod 0x00000001000914a2 _ZN5mongo5logOpEPKcS1_RKNS_7BSONObjEPS2_Pb + 66
      8 mongod 0x00000001001f4f0f _ZN5mongo14_updateObjectsEbPKcRKNS_7BSONObjES2_bbbRNS_7OpDebugEPNS_11RemoveSaverE + 4367
      9 mongod 0x00000001001f79cb _ZN5mongo13updateObjectsEPKcRKNS_7BSONObjES2_bbbRNS_7OpDebugE + 315
      10 mongod 0x00000001002baef0 _ZN5mongo14receivedUpdateERNS_7MessageERNS_5CurOpE + 1120
      11 mongod 0x00000001002bdd86 _ZN5mongo16assembleResponseERNS_7MessageERNS_10DbResponseERKNS_8SockAddrE + 5734
      12 mongod 0x00000001003c5c1d _ZN5mongo10connThreadEPNS_13MessagingPortE + 653
      13 libboost_thread-mt.dylib 0x00000001007c6254 thread_proxy + 132
      14 libSystem.B.dylib 0x00007fff887cb536 _pthread_start + 331
      15 libSystem.B.dylib 0x00007fff887cb3e9 thread_start + 13
      Mon Jan 31 14:17:55 [conn10] update test.col query: {} exception 0 assertion db/namespace.cpp:423 660ms
      Mon Jan 31 14:17:55 [conn9] couldn't make room for new record (len: 124) in capped ns local.oplog.rs
      Extent 0 (capExtent)
      magic: 41424344 extent->ns: local.oplog.rs
      fr: null lr: 2:3771fd0 extent->len: 192000256
      local.oplog.rs Assertion failure len * 5 > lastExtentSize db/namespace.cpp 423
      0x100037d3e 0x10003d9c6 0x100097ed8 0x10009b95d 0x10009773c 0x1002ca34a 0x100287691 0x1000914a2 0x1001f4f0f 0x1001f79cb 0x1002baef0 0x1002bdd86 0x1003c5c1d 0x1007c6254 0x7fff887cb536 0x7fff887cb3e9
      0 mongod 0x0000000100037d3e _ZN5mongo12sayDbContextEPKc + 174
      1 mongod 0x000000010003d9c6 _ZN5mongo8assertedEPKcS1_j + 342
      2 mongod 0x0000000100097ed8 _ZNK5mongo16NamespaceDetails13maybeComplainEPKci + 1736
      3 mongod 0x000000010009b95d _ZN5mongo16NamespaceDetails11cappedAllocEPKci + 2637
      4 mongod 0x000000010009773c _ZN5mongo16NamespaceDetails5allocEPKciRNS_7DiskLocE + 492
      5 mongod 0x00000001002ca34a _ZN5mongo11DataFileMgr17fast_oplog_insertEPNS_16NamespaceDetailsEPKci + 394
      6 mongod 0x0000000100287691 _ZN5mongoL8_logOpRSEPKcS1_S1_RKNS_7BSONObjEPS2_Pb + 1761
      7 mongod 0x00000001000914a2 _ZN5mongo5logOpEPKcS1_RKNS_7BSONObjEPS2_Pb + 66
      8 mongod 0x00000001001f4f0f _ZN5mongo14_updateObjectsEbPKcRKNS_7BSONObjES2_bbbRNS_7OpDebugEPNS_11RemoveSaverE + 4367
      9 mongod 0x00000001001f79cb _ZN5mongo13updateObjectsEPKcRKNS_7BSONObjES2_bbbRNS_7OpDebugE + 315
      10 mongod 0x00000001002baef0 _ZN5mongo14receivedUpdateERNS_7MessageERNS_5CurOpE + 1120
      11 mongod 0x00000001002bdd86 _ZN5mongo16assembleResponseERNS_7MessageERNS_10DbResponseERKNS_8SockAddrE + 5734
      12 mongod 0x00000001003c5c1d _ZN5mongo10connThreadEPNS_13MessagingPortE + 653
      13 libboost_thread-mt.dylib 0x00000001007c6254 thread_proxy + 132
      14 libSystem.B.dylib 0x00007fff887cb536 _pthread_start + 331
      15 libSystem.B.dylib 0x00007fff887cb3e9 thread_start + 13
      Mon Jan 31 14:17:55 [conn9] update test.col query: {} exception 0 assertion db/namespace.cpp:423 2312ms
      Mon Jan 31 14:17:55 [initandlisten] connection accepted from 127.0.0.1:63236 #19
      Mon Jan 31 14:17:55 [ReplSetHealthPollTask] replSet member 127.0.0.1:27019 RECOVERING
      Mon Jan 31 14:17:57 [conn19] query local.oplog.rs reslen:12863 nscanned:103 { ts:

      { $gte: new Date(5568430794390648212) }

      } nreturned:101 1656ms
      Mon Jan 31 14:18:01 [conn19] Assertion: 10334:Invalid BSONObj size: 50599229 (0x3D150403) first element: : ?type=122
      0x10003e25d 0x100008d1d 0x1002ce8b8 0x1001d6380 0x1002b682d 0x1002bda27 0x1003c5c1d 0x1007c6254 0x7fff887cb536 0x7fff887cb3e9
      0 mongod 0x000000010003e25d _ZN5mongo11msgassertedEiPKc + 381
      1 mongod 0x0000000100008d1d _ZNK5mongo7BSONObj14_assertInvalidEv + 1309
      2 mongod 0x00000001002ce8b8 _ZN5mongo11BasicCursor7currentEv + 104
      3 mongod 0x00000001001d6380 _ZN5mongo14processGetMoreEPKcixRNS_5CurOpEiRb + 1968
      4 mongod 0x00000001002b682d _ZN5mongo15receivedGetMoreERNS_10DbResponseERNS_7MessageERNS_5CurOpE + 589
      5 mongod 0x00000001002bda27 _ZN5mongo16assembleResponseERNS_7MessageERNS_10DbResponseERKNS_8SockAddrE + 4871
      6 mongod 0x00000001003c5c1d _ZN5mongo10connThreadEPNS_13MessagingPortE + 653
      7 libboost_thread-mt.dylib 0x00000001007c6254 thread_proxy + 132
      8 libSystem.B.dylib 0x00007fff887cb536 _pthread_start + 331
      9 libSystem.B.dylib 0x00007fff887cb3e9 thread_start + 13
      Mon Jan 31 14:18:01 [conn19] getmore local.oplog.rs cid:4746124974211895091 getMore: { query: {}, orderby:

      { $natural: -1 }

      } exception 10334 Invalid BSONObj size: 50599229 (0x3D150403) first element: : ?type=122 bytes:20 nreturned:0 193ms
      Mon Jan 31 14:18:11 [conn19] end connection 127.0.0.1:63236
      Mon Jan 31 14:18:12 [initandlisten] connection accepted from 127.0.0.1:63246 #20
      Mon Jan 31 14:18:13 [conn20] query local.oplog.rs reslen:12863 nscanned:103 { ts:

      { $gte: new Date(5568430794390648212) }

      } nreturned:101 1525ms
      Mon Jan 31 14:18:23 [conn20] end connection 127.0.0.1:63246
      Mon Jan 31 14:18:24 [initandlisten] connection accepted from 127.0.0.1:63253 #21
      Mon Jan 31 14:18:26 [conn21] query local.oplog.rs reslen:12863 nscanned:103 { ts:

      { $gte: new Date(5568430794390648212) }

      } nreturned:101 1525ms
      Mon Jan 31 14:18:36 [conn21] end connection 127.0.0.1:63253
      Mon Jan 31 14:18:37 [initandlisten] connection accepted from 127.0.0.1:63260 #22
      Mon Jan 31 14:18:38 [conn22] query local.oplog.rs reslen:12863 nscanned:103 { ts:

      { $gte: new Date(5568430794390648212) }

      } nreturned:101 1529ms
      Mon Jan 31 14:18:48 [conn22] end connection 127.0.0.1:63260
      Mon Jan 31 14:18:49 [initandlisten] connection accepted from 127.0.0.1:63267 #23
      Mon Jan 31 14:18:50 [dur] lsn set 234203
      Mon Jan 31 14:18:51 [conn23] query local.oplog.rs reslen:12863 nscanned:103 { ts:

      { $gte: new Date(5568430794390648212) }

      } nreturned:101 1670ms

            Assignee:
            Unassigned Unassigned
            Reporter:
            tonyh Tony Hannan
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

              Created:
              Updated:
              Resolved: