Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-19751

WiredTiger panic halt in eviction-server

    • Type: Icon: Bug Bug
    • Resolution: Done
    • Priority: Icon: Major - P3 Major - P3
    • 3.0.6, 3.1.7
    • Affects Version/s: 3.0.5
    • Component/s: WiredTiger
    • Labels:
    • Environment:
      AWS Linux amzn-ami-hvm-2015.03.0.x86_64-gp2 (ami-1ecae776) on a i2.xlarge instance
    • Fully Compatible
    • Linux

      Primary server on primary shard running mongodb 3.0.5 encountered panic halt. Log report of the crash:

      2015-08-04T12:21:33.770+0000 E STORAGE  WiredTiger (11) [1438690893:759387][2653:0x7f6a60957700], eviction-server: pthread_create: Resource temporarily unavailable
      2015-08-04T12:21:33.771+0000 I REPL     [ReplicationExecutor] Error in heartbeat request to xxx.xxx.xxx.xxx:27009; UnknownError Sending command { replSetHeartbeat: "xxxxx", pv: 1, v: 88440, from: "s
      1r2.db.meshfire.com:27017", fromId: 35, checkEmpty: false } on database admin over network to xxx.xxx.xxx.xxx:27009 received exception boost::thread_resource_error
      2015-08-04T12:21:34.171+0000 I NETWORK  [conn4522] end connection 10.169.110.83:58092 (178 connections now open)
      2015-08-04T12:21:34.171+0000 I NETWORK  [conn3298] end connection 10.146.247.2:59139 (178 connections now open)
      2015-08-04T12:21:34.171+0000 I NETWORK  [initandlisten] connection accepted from 10.146.247.2:33242 #4528 (181 connections now open)
      2015-08-04T12:21:34.371+0000 E STORAGE  WiredTiger (11) [1438690893:983089][2653:0x7f6a60957700], eviction-server: cache eviction server error: Resource temporarily unavailable
      2015-08-04T12:21:34.475+0000 I REPL     [ReplicationExecutor] replset: another primary seen with election time 55c0ae3f:1 my election time is 55c07850:1
      2015-08-04T12:21:34.483+0000 I COMMAND  [conn4525] command admin.$cmd command: isMaster { isMaster: 1 } ntoreturn:1 keyUpdates:0 writeConflicts:0 numYields:0 reslen:514 locks:{} 1221ms
      2015-08-04T12:21:34.483+0000 I COMMAND  [conn4524] command admin.$cmd command: isMaster { isMaster: 1 } ntoreturn:1 keyUpdates:0 writeConflicts:0 numYields:0 reslen:514 locks:{} 1221ms
      2015-08-04T12:21:34.791+0000 I -        [conn3313] Fatal Assertion 28559
      2015-08-04T12:21:34.795+0000 E STORAGE  WiredTiger (-31804) [1438690894:584032][2653:0x7f6a60957700], eviction-server: the process must exit and restart: WT_PANIC: WiredTiger library panic
      2015-08-04T12:21:34.794+0000 I NETWORK  [initandlisten] connection accepted from 10.155.4.24:55882 #4529 (180 connections now open)
      2015-08-04T12:21:34.993+0000 I -        [WiredTigerRecordStoreThread for local.oplog.rs] Fatal Assertion 28559
      2015-08-04T12:21:35.078+0000 E STORAGE  WiredTiger (-31804) [1438690894:891356][2653:0x7f6a5e152700], session.checkpoint: metadata unroll update file:collection-426-1927279256388819562.wt to allocation_size=4KB,app_metadata=(formatVersion=1),block_allocation=best,block_compressor=snappy,cache_resident=0,checkpoint=(WiredTigerCheckpoint.5871=(addr="019681e48761b57f9781e4a3ff13a59881e41e365d48808080e3317fc0e401820fc0",order=5871,time=1438690732,size=25309184,write_gen=2990956)),checkpoint_lsn=(6517,78431488),checksum=on,collator=,columns=,dictionary=0,format=btree,huffman_key=,huffman_value=,id=429,internal_item_max=0,internal_key_max=0,internal_key_truncate=,internal_page_max=4KB,key_format=q,key_gap=10,leaf_item_max=0,leaf_key_max=0,leaf_page_max=32KB,leaf_value_max=64MB,memory_page_max=10m,os_cache_dirty_max=0,os_cache_max=0,prefix_compression=0,prefix_compression_min=4,split_deepen_min_child=0,split_deepen_per_child=0,split_pct=90,value_format=u,version=(major=1,minor=1): WT_PANIC: WiredTiger library panic
      

      The backtrace:

      ----- BEGIN BACKTRACE -----
      {"backtrace":[{"b":"400000","o":"B77CA9"},{"b":"400000","o":"B16B61"},{"b":"400000","o":"AFB6E1"},{"b":"400000","o":"9A193A"},{"b":"400000","o":"FAA2F9"},{"b":"400000","o":"FAA4B5"},{"b":"400000","o":"F782F3"},{"b":"400000","o":"FB3811"},{"b":"400000","o":"FA6A86"},{"b":"400000","o":"F3F33A"},{"b":"7F6A6A082000","o":"7DF3"},{"b":"7F6A68A47000","o":"F61BD"}],"processInfo":{ "mongodbVersion" : "3.0.5", "gitVersion" : "8bc4ae20708dbb493cb09338d9e7be6698e4a3a3", "uname" : { "sysname" : "Linux", "release" : "3.14.48-33.39.amzn1.x86_64", "version" : "#1 SMP Tue Jul 14 23:43:07 UTC 2015", "machine" : "x86_64" }, "somap" : [ { "elfType" : 2, "b" : "400000", "buildId" : "E8A046532FE2C5A8474250BA8E5B72C6C84B9D68" }, { "b" : "7FFE218AA000", "elfType" : 3, "buildId" : "DA2E8BBD29FCD73624DD2C4211BC3845464603D1" }, { "b" : "7F6A6A082000", "path" : "/lib64/libpthread.so.0", "elfType" : 3, "buildId" : "D48D3E6672A77B603B402F661BABF75E90AD570B" }, { "b" : "7F6A69E15000", "path" : "/usr/lib64/libssl.so.10", "elfType" : 3, "buildId" : "22480480235F3B1C6C2E5E5953949728676D3796" }, { "b" : "7F6A69A30000", "path" : "/lib64/libcrypto.so.10", "elfType" : 3, "buildId" : "F1C146B78505646930DD9003AA2B3622C5226D1B" }, { "b" : "7F6A69828000", "path" : "/lib64/librt.so.1", "elfType" : 3, "buildId" : "E81013CBFA409053D58A65A0653271AB665A4619" }, { "b" : "7F6A69624000", "path" : "/lib64/libdl.so.2", "elfType" : 3, "buildId" : "62A8842157C62F95C3069CBF779AFCC26577A99A" }, { "b" : "7F6A69320000", "path" : "/usr/lib64/libstdc++.so.6", "elfType" : 3, "buildId" : "DD6383EEAC49E9BAA9E3D1080AE932F42CF8A385" }, { "b" : "7F6A6901E000", "path" : "/lib64/libm.so.6", "elfType" : 3, "buildId" : "5F97F8F8E5024E29717CF35998681F84D4A22D45" }, { "b" : "7F6A68E08000", "path" : "/lib64/libgcc_s.so.1", "elfType" : 3, "buildId" : "C52958E393BDF8E8D090F36DE0F4E620D8736FBF" }, { "b" : "7F6A68A47000", "path" : "/lib64/libc.so.6", "elfType" : 3, "buildId" : "DF6DA145A649EA093507A635AF383F608E7CE3F2" }, { "b" : "7F6A6A29E000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "6F90843B9087FE91955FEB0355EB0858EF9E97B2" }, { "b" : "7F6A687FB000", "path" : "/lib64/libgssapi_krb5.so.2", "elfType" : 3, "buildId" : "FF843C37C38E5BFFD57F7BCCAE05FDADC6390C8F" }, { "b" : "7F6A68518000", "path" : "/lib64/libkrb5.so.3", "elfType" : 3, "buildId" : "0BB150CC29DB5B0E039879EFC00152A75E3B00B9" }, { "b" : "7F6A68315000", "path" : "/usr/lib64/libcom_err.so.2", "elfType" : 3, "buildId" : "622F315EB5CB2F791E9B64020692EBA98195D06D" }, { "b" : "7F6A680E3000", "path" : "/lib64/libk5crypto.so.3", "elfType" : 3, "buildId" : "1485992B0E5CDBA0A34817FC8C6A4C45E82CD1A9" }, { "b" : "7F6A67ECD000", "path" : "/lib64/libz.so.1", "elfType" : 3, "buildId" : "89C6AF118B6B4FB6A73AE1813E2C8BDD722956D1" }, { "b" : "7F6A67CBE000", "path" : "/lib64/libkrb5support.so.0", "elfType" : 3, "buildId" : "A75A81EC50E9E0164A12B59D9987AD61AC7576C8" }, { "b" : "7F6A67ABB000", "path" : "/lib64/libkeyutils.so.1", "elfType" : 3, "buildId" : "37A58210FA50C91E09387765408A92909468D25B" }, { "b" : "7F6A678A1000", "path" : "/lib64/libresolv.so.2", "elfType" : 3, "buildId" : "6A7DA1CED90F65F27CB7B5BACDBB1C386C05F592" }, { "b" : "7F6A67680000", "path" : "/usr/lib64/libselinux.so.1", "elfType" : 3, "buildId" : "F5054DC94443326819FBF3065CFDF5E4726F57EE" } ] }}
       mongod(_ZN5mongo15printStackTraceERSo+0x29) [0xf77ca9]
       mongod(_ZN5mongo10logContextEPKc+0xE1) [0xf16b61]
       mongod(_ZN5mongo13fassertFailedEi+0x61) [0xefb6e1]
       mongod(+0x9A193A) [0xda193a]
       mongod(__wt_eventv+0x489) [0x13aa2f9]
       mongod(__wt_err+0x95) [0x13aa4b5]
       mongod(__wt_meta_track_off+0x1D3) [0x13782f3]
       mongod(__wt_txn_checkpoint+0xAE1) [0x13b3811]
       mongod(+0xFA6A86) [0x13a6a86]
       mongod(+0xF3F33A) [0x133f33a]
       libpthread.so.0(+0x7DF3) [0x7f6a6a089df3]
       libc.so.6(clone+0x6D) [0x7f6a68b3d1bd]
      -----  END BACKTRACE  -----
      

      We have had repeated occurrences of the primary server cache going to 100% and remaining there until the instance is stepped down. We also experience frequent OOM crashes as the memory footprint of mongodb grows beyond available memory. This condition occurs despite setting the wired tiger cache size to 6GB less than available memory.

            Assignee:
            michael.cahill@mongodb.com Michael Cahill (Inactive)
            Reporter:
            mike@meshfire.com Michael Templeman
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

              Created:
              Updated:
              Resolved: