Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-16867

Server crashed during iiBench run using WiredTiger engine

    • Fully Compatible
    • ALL
    • Hide

      iiBench : https://github.com/tmcallaghan/iibench-mongodb
      4 writers, 1000 inserts per batch
      Ubuntu 14.04.1
      48G RAM in server
      WT configured to be 8G cache, directIO on [data], all else defaults
      storage is SSD (~50,000 random IOPs)

      immediate rerun did not reproduce.

      Show
      iiBench : https://github.com/tmcallaghan/iibench-mongodb 4 writers, 1000 inserts per batch Ubuntu 14.04.1 48G RAM in server WT configured to be 8G cache, directIO on [data] , all else defaults storage is SSD (~50,000 random IOPs) immediate rerun did not reproduce.

      Running iiBench insert only workload, specific details in steps to reproduce. Here is the server's log from startup:

      2015-01-15T10:03:05.095-0500 I CONTROL  [initandlisten] MongoDB starting : pid=1708 port=27017 dbpath=/data.ssd/tcallaghan/data/mongo-data 64-bit host=lex5
      2015-01-15T10:03:05.095-0500 I CONTROL  [initandlisten] db version v2.8.0-rc4
      2015-01-15T10:03:05.095-0500 I CONTROL  [initandlisten] git version: 3ad571742911f04b307f0071979425511c4f2570
      2015-01-15T10:03:05.095-0500 I CONTROL  [initandlisten] build info: Linux build11.nj1.10gen.cc 2.6.32-431.3.1.el6.x86_64 #1 SMP Fri Jan 3 21:39:27 UTC 2014 x86_64 BOOST_LIB_VERSION=1_49
      2015-01-15T10:03:05.095-0500 I CONTROL  [initandlisten] allocator: tcmalloc
      2015-01-15T10:03:05.095-0500 I CONTROL  [initandlisten] options: { processManagement: { fork: true }, replication: { replSet: "tmcRepl" }, storage: { dbPath: "/data.ssd/tcallaghan/data/mongo-data", engine: "wiredtiger", journal: { enabled: true }, wiredTiger: { engineConfig: { cacheSizeGB: 8, configString: "direct_io=[data]" } } }, systemLog: { destination: "file", path: "lex5.ssd-mongoiibench-1-100000000-1000-999999-4-wt-0-15.log.mongolog" } }
      2015-01-15T10:03:05.095-0500 W STORAGE  [initandlisten] Detected configuration for non-active storage engine wiredTiger when current storage engine is wiredtiger
      2015-01-15T10:03:05.095-0500 I STORAGE  [initandlisten] wiredtiger_open config: create,cache_size=8G,session_max=20000,extensions=[local=(entry=index_collator_extension)],statistics=(fast),log=(enabled=true,archive=true,path=journal,compressor=snappy),checkpoint=(wait=60,log_size=2GB),statistics_log=(wait=0),direct_io=[data]
      2015-01-15T10:03:05.114-0500 I REPL     [initandlisten] Did not find local replica set configuration document at startup;  NoMatchingDocument Did not find replica set configuration document in local.system.replset
      2015-01-15T10:03:05.118-0500 I NETWORK  [initandlisten] waiting for connections on port 27017
      2015-01-15T10:03:05.118-0500 I NETWORK  [initandlisten] connection accepted from 127.0.0.1:34456 #1 (1 connection now open)
      2015-01-15T10:03:06.125-0500 I NETWORK  [conn1] end connection 127.0.0.1:34456 (0 connections now open)
      2015-01-15T10:03:06.172-0500 I NETWORK  [initandlisten] connection accepted from 127.0.0.1:34457 #2 (1 connection now open)
      2015-01-15T10:03:06.173-0500 I COMMAND  [conn2] replSet info initiate : no configuration specified.  Using a default configuration for the set
      2015-01-15T10:03:06.173-0500 I COMMAND  [conn2] replSet created this configuration for initiation : { _id: "tmcRepl", version: 1, members: [ { _id: 0, host: "lex5:27017" } ] }
      2015-01-15T10:03:06.173-0500 I REPL     [conn2] replSetInitiate admin command received from client
      2015-01-15T10:03:06.174-0500 I REPL     [conn2] replSet replSetInitiate config object with 1 members parses ok
      2015-01-15T10:03:06.177-0500 I REPL     [ReplicationExecutor] new replica set config in use: { _id: "tmcRepl", version: 1, members: [ { _id: 0, host: "lex5:27017", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 1.0, tags: {}, slaveDelay: 0, votes: 1 } ], settings: { chainingAllowed: true, heartbeatTimeoutSecs: 10, getLastErrorModes: {}, getLastErrorDefaults: { w: 1, wtimeout: 0 } } }
      2015-01-15T10:03:06.177-0500 I REPL     [ReplicationExecutor] transition to STARTUP2
      2015-01-15T10:03:06.178-0500 I REPL     [conn2] ******
      2015-01-15T10:03:06.178-0500 I REPL     [conn2] creating replication oplog of size: 22838MB...
      2015-01-15T10:03:06.188-0500 I REPL     [conn2] ******
      2015-01-15T10:03:06.188-0500 I REPL     [conn2] Starting replication applier threads
      2015-01-15T10:03:06.188-0500 I REPL     [ReplicationExecutor] transition to RECOVERING
      2015-01-15T10:03:06.190-0500 I REPL     [ReplicationExecutor] transition to PRIMARY
      2015-01-15T10:03:06.190-0500 I REPL     [rsSync] transition to primary complete; database writes are now permitted
      

      Here is the stack trace from the crash:

      2015-01-15T10:11:32.156-0500 F REPL     [conn8] Fatal DBException in logOp(): 112 WriteConflict
      2015-01-15T10:11:32.156-0500 F -        [conn8] terminate() called. An exception is active; attempting to gather more information
      2015-01-15T10:11:32.192-0500 F -        [conn8] DBException::toString(): 112 WriteConflict
      Actual exception type: mongo::WriteConflictException
      
       0xf133b9 0xf12eb0 0x7f29dad43836 0x7f29dad43863 0xc1594c 0x98cb15 0x98d210 0x98e954 0x98f055 0x99110d 0x9ad054 0x9adf93 0x9aea4b 0xb7ca1a 0xa8fcd5 0x7e41f0 0xed1381 0x7f29db3fd182 0x7f29da4fdefd
      ----- BEGIN BACKTRACE -----
      {"backtrace":[{"b":"400000","o":"B133B9"},{"b":"400000","o":"B12EB0"},{"b":"7F29DACE5000","o":"5E836"},{"b":"7F29DACE5000","o":"5E863"},{"b":"400000","o":"81594C"},{"b":"400000","o":"58CB15"},{"b":"400000","o":"58D210"},{"b":"400000","o":"58E954"},{"b":"400000","o":"58F055"},{"b":"400000","o":"59110D"},{"b":"400000","o":"5AD054"},{"b":"400000","o":"5ADF93"},{"b":"400000","o":"5AEA4B"},{"b":"400000","o":"77CA1A"},{"b":"400000","o":"68FCD5"},{"b":"400000","o":"3E41F0"},{"b":"400000","o":"AD1381"},{"b":"7F29DB3F5000","o":"8182"},{"b":"7F29DA403000","o":"FAEFD"}],"processInfo":{ "mongodbVersion" : "2.8.0-rc4", "gitVersion" : "3ad571742911f04b307f0071979425511c4f2570", "uname" : { "sysname" : "Linux", "release" : "3.13.0-40-generic", "version" : "#69-Ubuntu SMP Thu Nov 13 17:53:56 UTC 2014", "machine" : "x86_64" }, "somap" : [ { "elfType" : 2, "b" : "400000" }, { "b" : "7FFF894EE000", "elfType" : 3 }, { "b" : "7F29DB3F5000", "path" : "/lib/x86_64-linux-gnu/libpthread.so.0", "elfType" : 3 }, { "b" : "7F29DB1ED000", "path" : "/lib/x86_64-linux-gnu/librt.so.1", "elfType" : 3 }, { "b" : "7F29DAFE9000", "path" : "/lib/x86_64-linux-gnu/libdl.so.2", "elfType" : 3 }, { "b" : "7F29DACE5000", "path" : "/usr/lib/x86_64-linux-gnu/libstdc++.so.6", "elfType" : 3 }, { "b" : "7F29DA9DF000", "path" : "/lib/x86_64-linux-gnu/libm.so.6", "elfType" : 3 }, { "b" : "7F29DA7C9000", "path" : "/lib/x86_64-linux-gnu/libgcc_s.so.1", "elfType" : 3 }, { "b" : "7F29DA403000", "path" : "/lib/x86_64-linux-gnu/libc.so.6", "elfType" : 3 }, { "b" : "7F29DB613000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3 } ] }}
       mongod(_ZN5mongo15printStackTraceERSo+0x29) [0xf133b9]
       mongod(+0xB12EB0) [0xf12eb0]
       libstdc++.so.6(+0x5E836) [0x7f29dad43836]
       libstdc++.so.6(+0x5E863) [0x7f29dad43863]
       mongod(_ZN5mongo4repl5logOpEPNS_16OperationContextEPKcS4_RKNS_7BSONObjEPS5_Pbb+0x1DC) [0xc1594c]
       mongod(_ZN5mongo18WriteBatchExecutor13execOneInsertEPNS0_16ExecInsertsStateEPPNS_16WriteErrorDetailE+0xEA5) [0x98cb15]
       mongod(_ZN5mongo18WriteBatchExecutor11execInsertsERKNS_21BatchedCommandRequestEPSt6vectorIPNS_16WriteErrorDetailESaIS6_EE+0x250) [0x98d210]
       mongod(_ZN5mongo18WriteBatchExecutor11bulkExecuteERKNS_21BatchedCommandRequestEPSt6vectorIPNS_19BatchedUpsertDetailESaIS6_EEPS4_IPNS_16WriteErrorDetailESaISB_EE+0x34) [0x98e954]
       mongod(_ZN5mongo18WriteBatchExecutor12executeBatchERKNS_21BatchedCommandRequestEPNS_22BatchedCommandResponseE+0x395) [0x98f055]
       mongod(_ZN5mongo8WriteCmd3runEPNS_16OperationContextERKSsRNS_7BSONObjEiRSsRNS_14BSONObjBuilderEb+0x15D) [0x99110d]
       mongod(_ZN5mongo12_execCommandEPNS_16OperationContextEPNS_7CommandERKSsRNS_7BSONObjEiRSsRNS_14BSONObjBuilderEb+0x34) [0x9ad054]
       mongod(_ZN5mongo7Command11execCommandEPNS_16OperationContextEPS0_iPKcRNS_7BSONObjERNS_14BSONObjBuilderEb+0xC13) [0x9adf93]
       mongod(_ZN5mongo12_runCommandsEPNS_16OperationContextEPKcRNS_7BSONObjERNS_11_BufBuilderINS_16TrivialAllocatorEEERNS_14BSONObjBuilderEbi+0x28B) [0x9aea4b]
       mongod(_ZN5mongo8runQueryEPNS_16OperationContextERNS_7MessageERNS_12QueryMessageERNS_5CurOpES3_b+0x76A) [0xb7ca1a]
       mongod(_ZN5mongo16assembleResponseEPNS_16OperationContextERNS_7MessageERNS_10DbResponseERKNS_11HostAndPortEb+0xB25) [0xa8fcd5]
       mongod(_ZN5mongo16MyMessageHandler7processERNS_7MessageEPNS_21AbstractMessagingPortEPNS_9LastErrorE+0xE0) [0x7e41f0]
       mongod(_ZN5mongo17PortMessageServer17handleIncomingMsgEPv+0x411) [0xed1381]
       libpthread.so.0(+0x8182) [0x7f29db3fd182]
       libc.so.6(clone+0x6D) [0x7f29da4fdefd]
      -----  END BACKTRACE  -----
      

            Assignee:
            schwerin@mongodb.com Andy Schwerin
            Reporter:
            tmcallaghan Tim Callaghan
            Votes:
            0 Vote for this issue
            Watchers:
            16 Start watching this issue

              Created:
              Updated:
              Resolved: