Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-19751

WiredTiger panic halt in eviction-server

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Fixed
    • Affects Version/s: 3.0.5
    • Fix Version/s: 3.0.6, 3.1.7
    • Component/s: WiredTiger
    • Labels:
    • Environment:
      AWS Linux amzn-ami-hvm-2015.03.0.x86_64-gp2 (ami-1ecae776) on a i2.xlarge instance
    • Backwards Compatibility:
      Fully Compatible
    • Operating System:
      Linux
    • Backport Completed:

      Description

      Primary server on primary shard running mongodb 3.0.5 encountered panic halt. Log report of the crash:

      2015-08-04T12:21:33.770+0000 E STORAGE  WiredTiger (11) [1438690893:759387][2653:0x7f6a60957700], eviction-server: pthread_create: Resource temporarily unavailable
      2015-08-04T12:21:33.771+0000 I REPL     [ReplicationExecutor] Error in heartbeat request to xxx.xxx.xxx.xxx:27009; UnknownError Sending command { replSetHeartbeat: "xxxxx", pv: 1, v: 88440, from: "s
      1r2.db.meshfire.com:27017", fromId: 35, checkEmpty: false } on database admin over network to xxx.xxx.xxx.xxx:27009 received exception boost::thread_resource_error
      2015-08-04T12:21:34.171+0000 I NETWORK  [conn4522] end connection 10.169.110.83:58092 (178 connections now open)
      2015-08-04T12:21:34.171+0000 I NETWORK  [conn3298] end connection 10.146.247.2:59139 (178 connections now open)
      2015-08-04T12:21:34.171+0000 I NETWORK  [initandlisten] connection accepted from 10.146.247.2:33242 #4528 (181 connections now open)
      2015-08-04T12:21:34.371+0000 E STORAGE  WiredTiger (11) [1438690893:983089][2653:0x7f6a60957700], eviction-server: cache eviction server error: Resource temporarily unavailable
      2015-08-04T12:21:34.475+0000 I REPL     [ReplicationExecutor] replset: another primary seen with election time 55c0ae3f:1 my election time is 55c07850:1
      2015-08-04T12:21:34.483+0000 I COMMAND  [conn4525] command admin.$cmd command: isMaster { isMaster: 1 } ntoreturn:1 keyUpdates:0 writeConflicts:0 numYields:0 reslen:514 locks:{} 1221ms
      2015-08-04T12:21:34.483+0000 I COMMAND  [conn4524] command admin.$cmd command: isMaster { isMaster: 1 } ntoreturn:1 keyUpdates:0 writeConflicts:0 numYields:0 reslen:514 locks:{} 1221ms
      2015-08-04T12:21:34.791+0000 I -        [conn3313] Fatal Assertion 28559
      2015-08-04T12:21:34.795+0000 E STORAGE  WiredTiger (-31804) [1438690894:584032][2653:0x7f6a60957700], eviction-server: the process must exit and restart: WT_PANIC: WiredTiger library panic
      2015-08-04T12:21:34.794+0000 I NETWORK  [initandlisten] connection accepted from 10.155.4.24:55882 #4529 (180 connections now open)
      2015-08-04T12:21:34.993+0000 I -        [WiredTigerRecordStoreThread for local.oplog.rs] Fatal Assertion 28559
      2015-08-04T12:21:35.078+0000 E STORAGE  WiredTiger (-31804) [1438690894:891356][2653:0x7f6a5e152700], session.checkpoint: metadata unroll update file:collection-426-1927279256388819562.wt to allocation_size=4KB,app_metadata=(formatVersion=1),block_allocation=best,block_compressor=snappy,cache_resident=0,checkpoint=(WiredTigerCheckpoint.5871=(addr="019681e48761b57f9781e4a3ff13a59881e41e365d48808080e3317fc0e401820fc0",order=5871,time=1438690732,size=25309184,write_gen=2990956)),checkpoint_lsn=(6517,78431488),checksum=on,collator=,columns=,dictionary=0,format=btree,huffman_key=,huffman_value=,id=429,internal_item_max=0,internal_key_max=0,internal_key_truncate=,internal_page_max=4KB,key_format=q,key_gap=10,leaf_item_max=0,leaf_key_max=0,leaf_page_max=32KB,leaf_value_max=64MB,memory_page_max=10m,os_cache_dirty_max=0,os_cache_max=0,prefix_compression=0,prefix_compression_min=4,split_deepen_min_child=0,split_deepen_per_child=0,split_pct=90,value_format=u,version=(major=1,minor=1): WT_PANIC: WiredTiger library panic
      

      The backtrace:

      ----- BEGIN BACKTRACE -----
      {"backtrace":[{"b":"400000","o":"B77CA9"},{"b":"400000","o":"B16B61"},{"b":"400000","o":"AFB6E1"},{"b":"400000","o":"9A193A"},{"b":"400000","o":"FAA2F9"},{"b":"400000","o":"FAA4B5"},{"b":"400000","o":"F782F3"},{"b":"400000","o":"FB3811"},{"b":"400000","o":"FA6A86"},{"b":"400000","o":"F3F33A"},{"b":"7F6A6A082000","o":"7DF3"},{"b":"7F6A68A47000","o":"F61BD"}],"processInfo":{ "mongodbVersion" : "3.0.5", "gitVersion" : "8bc4ae20708dbb493cb09338d9e7be6698e4a3a3", "uname" : { "sysname" : "Linux", "release" : "3.14.48-33.39.amzn1.x86_64", "version" : "#1 SMP Tue Jul 14 23:43:07 UTC 2015", "machine" : "x86_64" }, "somap" : [ { "elfType" : 2, "b" : "400000", "buildId" : "E8A046532FE2C5A8474250BA8E5B72C6C84B9D68" }, { "b" : "7FFE218AA000", "elfType" : 3, "buildId" : "DA2E8BBD29FCD73624DD2C4211BC3845464603D1" }, { "b" : "7F6A6A082000", "path" : "/lib64/libpthread.so.0", "elfType" : 3, "buildId" : "D48D3E6672A77B603B402F661BABF75E90AD570B" }, { "b" : "7F6A69E15000", "path" : "/usr/lib64/libssl.so.10", "elfType" : 3, "buildId" : "22480480235F3B1C6C2E5E5953949728676D3796" }, { "b" : "7F6A69A30000", "path" : "/lib64/libcrypto.so.10", "elfType" : 3, "buildId" : "F1C146B78505646930DD9003AA2B3622C5226D1B" }, { "b" : "7F6A69828000", "path" : "/lib64/librt.so.1", "elfType" : 3, "buildId" : "E81013CBFA409053D58A65A0653271AB665A4619" }, { "b" : "7F6A69624000", "path" : "/lib64/libdl.so.2", "elfType" : 3, "buildId" : "62A8842157C62F95C3069CBF779AFCC26577A99A" }, { "b" : "7F6A69320000", "path" : "/usr/lib64/libstdc++.so.6", "elfType" : 3, "buildId" : "DD6383EEAC49E9BAA9E3D1080AE932F42CF8A385" }, { "b" : "7F6A6901E000", "path" : "/lib64/libm.so.6", "elfType" : 3, "buildId" : "5F97F8F8E5024E29717CF35998681F84D4A22D45" }, { "b" : "7F6A68E08000", "path" : "/lib64/libgcc_s.so.1", "elfType" : 3, "buildId" : "C52958E393BDF8E8D090F36DE0F4E620D8736FBF" }, { "b" : "7F6A68A47000", "path" : "/lib64/libc.so.6", "elfType" : 3, "buildId" : "DF6DA145A649EA093507A635AF383F608E7CE3F2" }, { "b" : "7F6A6A29E000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "6F90843B9087FE91955FEB0355EB0858EF9E97B2" }, { "b" : "7F6A687FB000", "path" : "/lib64/libgssapi_krb5.so.2", "elfType" : 3, "buildId" : "FF843C37C38E5BFFD57F7BCCAE05FDADC6390C8F" }, { "b" : "7F6A68518000", "path" : "/lib64/libkrb5.so.3", "elfType" : 3, "buildId" : "0BB150CC29DB5B0E039879EFC00152A75E3B00B9" }, { "b" : "7F6A68315000", "path" : "/usr/lib64/libcom_err.so.2", "elfType" : 3, "buildId" : "622F315EB5CB2F791E9B64020692EBA98195D06D" }, { "b" : "7F6A680E3000", "path" : "/lib64/libk5crypto.so.3", "elfType" : 3, "buildId" : "1485992B0E5CDBA0A34817FC8C6A4C45E82CD1A9" }, { "b" : "7F6A67ECD000", "path" : "/lib64/libz.so.1", "elfType" : 3, "buildId" : "89C6AF118B6B4FB6A73AE1813E2C8BDD722956D1" }, { "b" : "7F6A67CBE000", "path" : "/lib64/libkrb5support.so.0", "elfType" : 3, "buildId" : "A75A81EC50E9E0164A12B59D9987AD61AC7576C8" }, { "b" : "7F6A67ABB000", "path" : "/lib64/libkeyutils.so.1", "elfType" : 3, "buildId" : "37A58210FA50C91E09387765408A92909468D25B" }, { "b" : "7F6A678A1000", "path" : "/lib64/libresolv.so.2", "elfType" : 3, "buildId" : "6A7DA1CED90F65F27CB7B5BACDBB1C386C05F592" }, { "b" : "7F6A67680000", "path" : "/usr/lib64/libselinux.so.1", "elfType" : 3, "buildId" : "F5054DC94443326819FBF3065CFDF5E4726F57EE" } ] }}
       mongod(_ZN5mongo15printStackTraceERSo+0x29) [0xf77ca9]
       mongod(_ZN5mongo10logContextEPKc+0xE1) [0xf16b61]
       mongod(_ZN5mongo13fassertFailedEi+0x61) [0xefb6e1]
       mongod(+0x9A193A) [0xda193a]
       mongod(__wt_eventv+0x489) [0x13aa2f9]
       mongod(__wt_err+0x95) [0x13aa4b5]
       mongod(__wt_meta_track_off+0x1D3) [0x13782f3]
       mongod(__wt_txn_checkpoint+0xAE1) [0x13b3811]
       mongod(+0xFA6A86) [0x13a6a86]
       mongod(+0xF3F33A) [0x133f33a]
       libpthread.so.0(+0x7DF3) [0x7f6a6a089df3]
       libc.so.6(clone+0x6D) [0x7f6a68b3d1bd]
      -----  END BACKTRACE  -----
      

      We have had repeated occurrences of the primary server cache going to 100% and remaining there until the instance is stepped down. We also experience frequent OOM crashes as the memory footprint of mongodb grows beyond available memory. This condition occurs despite setting the wired tiger cache size to 6GB less than available memory.

        Issue Links

          Activity

          Hide
          mike@meshfire.com Michael Templeman added a comment -

          Darn. I screwed up the formatting again. For log dumps what is the proper format command?

          Show
          mike@meshfire.com Michael Templeman added a comment - Darn. I screwed up the formatting again. For log dumps what is the proper format command?
          Hide
          ramon.fernandez Ramon Fernandez added a comment -

          Hi Michael Templeman, thanks for the report – we're investigating. I'd recommend the {noformat} macro for logs, I've edited the description above to use it.

          Show
          ramon.fernandez Ramon Fernandez added a comment - Hi Michael Templeman , thanks for the report – we're investigating. I'd recommend the {noformat} macro for logs, I've edited the description above to use it.
          Hide
          ramon.fernandez Ramon Fernandez added a comment -

          Michael Templeman, the error message:

          2015-08-04T12:21:33.770+0000 E STORAGE  WiredTiger (11) [1438690893:759387][2653:0x7f6a60957700], eviction-server: pthread_create: Resource temporarily unavailable
          

          indicates that the eviction server failed to create a new thread with EAGAIN. pthread_create(3) reads:

                 EAGAIN Insufficient resources to create another thread, or a system-imposed limit on  the  number  of  threads  was
                        encountered.   The  latter  case  may occur in two ways: the RLIMIT_NPROC soft resource limit (set via setr‐
                        limit(2)), which limits the number of process for a real user ID, was reached; or the  kernel's  system-wide
                        limit on the number of threads, /proc/sys/kernel/threads-max, was reached.
          

          Can you please provide more details about the memory configuration and utilization on the affected node, as well as information about system limits for the mongod process?

          Thanks,
          Ramón.

          Show
          ramon.fernandez Ramon Fernandez added a comment - Michael Templeman , the error message: 2015-08-04T12:21:33.770+0000 E STORAGE WiredTiger (11) [1438690893:759387][2653:0x7f6a60957700], eviction-server: pthread_create: Resource temporarily unavailable indicates that the eviction server failed to create a new thread with EAGAIN . pthread_create(3) reads: EAGAIN Insufficient resources to create another thread, or a system-imposed limit on the number of threads was encountered. The latter case may occur in two ways: the RLIMIT_NPROC soft resource limit (set via setr‐ limit(2)), which limits the number of process for a real user ID, was reached; or the kernel's system-wide limit on the number of threads, /proc/sys/kernel/threads-max, was reached. Can you please provide more details about the memory configuration and utilization on the affected node, as well as information about system limits for the mongod process? Thanks, Ramón.
          Hide
          mike@meshfire.com Michael Templeman added a comment -

          Ramon

          The i2.xlarge ec2 instance has 30GB of memory. Nothing else runs on the server than mongo. Mongod was started with:

          mongod --dbpath xxx/xxxxx --storageEngine wiredTiger --wiredTigerCacheSizeGB 22 --logpath /xxxx/xxxxx/mongo.log --oplogSize 90000 --fork --replSet xxxxx
          

          I don't know what the actual memory utilization was when the process halted. Past experience has been that under certain operations the process cache goes to 100% and the process memory utilization as reported by mongostat will rise to around 29GB (at which point a crash is imminent).

          Show
          mike@meshfire.com Michael Templeman added a comment - Ramon The i2.xlarge ec2 instance has 30GB of memory. Nothing else runs on the server than mongo. Mongod was started with: mongod --dbpath xxx/xxxxx --storageEngine wiredTiger --wiredTigerCacheSizeGB 22 --logpath /xxxx/xxxxx/mongo.log --oplogSize 90000 --fork --replSet xxxxx I don't know what the actual memory utilization was when the process halted. Past experience has been that under certain operations the process cache goes to 100% and the process memory utilization as reported by mongostat will rise to around 29GB (at which point a crash is imminent).
          Hide
          mike@meshfire.com Michael Templeman added a comment -

          oops, I forgot to report that the ulimit was set to 64000, THP was turned off and tcp_keepalive_time is set to 120.

          Show
          mike@meshfire.com Michael Templeman added a comment - oops, I forgot to report that the ulimit was set to 64000, THP was turned off and tcp_keepalive_time is set to 120.
          Hide
          xgen-internal-githook Githook User added a comment -

          Author:

          {u'username': u'michaelcahill', u'name': u'Michael Cahill', u'email': u'michael.cahill@mongodb.com'}

          Message: SERVER-19751 Retry pthread_create on EAGAIN or EINTR.
          Branch: develop
          https://github.com/wiredtiger/wiredtiger/commit/4a322631ec244fe5cc204b560ab09cda60ada00a

          Show
          xgen-internal-githook Githook User added a comment - Author: {u'username': u'michaelcahill', u'name': u'Michael Cahill', u'email': u'michael.cahill@mongodb.com'} Message: SERVER-19751 Retry pthread_create on EAGAIN or EINTR. Branch: develop https://github.com/wiredtiger/wiredtiger/commit/4a322631ec244fe5cc204b560ab09cda60ada00a
          Hide
          xgen-internal-githook Githook User added a comment -

          Author:

          {u'username': u'keithbostic', u'name': u'Keith Bostic', u'email': u'keith.bostic@mongodb.com'}

          Message: Merge pull request #2107 from wiredtiger/pthread-create-retry

          SERVER-19751 Retry pthread_create on EAGAIN or EINTR.
          Branch: develop
          https://github.com/wiredtiger/wiredtiger/commit/b52d2d3a9cfdfa32ac2eacf3d3d5f281395d834f

          Show
          xgen-internal-githook Githook User added a comment - Author: {u'username': u'keithbostic', u'name': u'Keith Bostic', u'email': u'keith.bostic@mongodb.com'} Message: Merge pull request #2107 from wiredtiger/pthread-create-retry SERVER-19751 Retry pthread_create on EAGAIN or EINTR. Branch: develop https://github.com/wiredtiger/wiredtiger/commit/b52d2d3a9cfdfa32ac2eacf3d3d5f281395d834f
          Hide
          xgen-internal-githook Githook User added a comment -

          Author:

          {u'username': u'keithbostic', u'name': u'Keith Bostic', u'email': u'keith.bostic@mongodb.com'}

          Message: SERVER-19751 Retry pthread_create on EAGAIN or EINTR.

          Merge pull request #2107 from wiredtiger/pthread-create-retry
          (cherry picked from commit b52d2d3a9cfdfa32ac2eacf3d3d5f281395d834f)
          Branch: mongodb-3.0
          https://github.com/wiredtiger/wiredtiger/commit/7cb51cd688c2375a504f99d5c57674e7fdfbcee0

          Show
          xgen-internal-githook Githook User added a comment - Author: {u'username': u'keithbostic', u'name': u'Keith Bostic', u'email': u'keith.bostic@mongodb.com'} Message: SERVER-19751 Retry pthread_create on EAGAIN or EINTR. Merge pull request #2107 from wiredtiger/pthread-create-retry (cherry picked from commit b52d2d3a9cfdfa32ac2eacf3d3d5f281395d834f) Branch: mongodb-3.0 https://github.com/wiredtiger/wiredtiger/commit/7cb51cd688c2375a504f99d5c57674e7fdfbcee0

            People

            • Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: