Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-32170

tcmalloc segfault at time of first swap usage

    • Type: Icon: Bug Bug
    • Resolution: Cannot Reproduce
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 3.2.16
    • Component/s: Stability
    • Labels:
      None
    • Storage Execution
    • ALL

      Whilst a shard mongod node was running it suddenly crashed with the following "Invalid access at address: 0" message in the log:

      Unable to find source-code formatter for language: txt. Available languages are: actionscript, ada, applescript, bash, c, c#, c++, cpp, css, erlang, go, groovy, haskell, html, java, javascript, js, json, lua, none, nyan, objc, perl, php, python, r, rainbow, ruby, scala, sh, sql, swift, visualbasic, xml, yaml
      2017-11-30T03:44:07.078+0000 I NETWORK  [initandlisten] connection accepted from 127.0.0.1:57502 #2568005 (319 connections now open)
      2017-11-30T03:44:07.078+0000 I NETWORK  [conn2568005] AssertionException handling request, closing client connection: 17132 SSL handshake received but server is started without SSL support
      2017-11-30T03:44:07.117+0000 F -        [thread1] Invalid access at address: 0
      2017-11-30T03:44:07.175+0000 F -        [thread1] Got signal: 11 (Segmentation fault).
      
       0x1352122 0x1351279 0x13515f8 0x7f087a3c8100 0x143cf93 0x143d06d 0x1bb3ee0 0x19ffac9 0x1a670ec 0x1a61aea 0x1a62047 0x1a638ca 0x1acbf96 0x7f087a3c0dc5 0x7f087a0ee21d
      ----- BEGIN BACKTRACE -----
      {"backtrace":[{"b":"400000","o":"F52122","s":"_ZN5mongo15printStackTraceERSo"},{"b":"400000","o":"F51279"},{"b":"400000","o":"F515F8"},{"b":"7F087A3B9000","o":"F100"},{"b":"400000","o":"103CF93","s":"_ZN8tcmalloc11ThreadCache21ReleaseToCentralCacheEPNS0_8FreeListEmi"},{"b":"400000","o":"103D06D","s":"_ZN8tcmalloc11ThreadCache11ListTooLongEPNS0_8FreeListEm"},{"b":"400000","o":"17B3EE0","s":"free"},{"b":"400000","o":"15FFAC9"},{"b":"400000","o":"16670EC","s":"__wt_evict"},{"b":"400000","o":"1661AEA"},{"b":"400000","o":"1662047"},{"b":"400000","o":"16638CA","s":"__wt_evict_thread_run"},{"b":"400000","o":"16CBF96","s":"__wt_thread_run"},{"b":"7F087A3B9000","o":"7DC5"},{"b":"7F0879FF8000","o":"F621D","s":"clone"}],"processInfo":{ "mongodbVersion" : "3.2.16", "gitVersion" : "056bf45128114e44c5358c7a8776fb582363e094", "compiledModules" : [], "uname" : { "sysname" : "Linux", "release" : "3.10.0-327.el7.x86_64", "version" : "#1 SMP Thu Nov 19 22:10:57 UTC 2015", "machine" : "x86_64" }, "somap" : [ { "elfType" : 2, "b" : "400000", "buildId" : "46D1E684CF5C5862766541145ADE571355BC0F10" }, { "b" : "7FFC90AEE000", "elfType" : 3, "buildId" : "0F488C85309E4AB293D3B758748A3037B01F9885" }, { "b" : "7F087B2E1000", "path" : "/lib64/libssl.so.10", "elfType" : 3, "buildId" : "478D01A08B923A251D755BB421F3EBAF9F2982C1" }, { "b" : "7F087AEF9000", "path" : "/lib64/libcrypto.so.10", "elfType" : 3, "buildId" : "42AAFD25E9B5F4CE2EFE6309491445B1A92A575D" }, { "b" : "7F087ACF1000", "path" : "/lib64/librt.so.1", "elfType" : 3, "buildId" : "4D22B5631EAF7D1EB6E72A42315105D14C70DC50" }, { "b" : "7F087AAED000", "path" : "/lib64/libdl.so.2", "elfType" : 3, "buildId" : "091060A163E7EDA25572F3B1BAF2E8F80209C00E" }, { "b" : "7F087A7EB000", "path" : "/lib64/libm.so.6", "elfType" : 3, "buildId" : "8DEE792EF5B31B20A3CD95985E8BE374D5C6A8C2" }, { "b" : "7F087A5D5000", "path" : "/lib64/libgcc_s.so.1", "elfType" : 3, "buildId" : "6AA1DCC4DE7F1836344949857FC2017278631FFD" }, { "b" : "7F087A3B9000", "path" : "/lib64/libpthread.so.0", "elfType" : 3, "buildId" : "4C0F15C537CF112511B6BE97DE1627C350652AA2" }, { "b" : "7F0879FF8000", "path" : "/lib64/libc.so.6", "elfType" : 3, "buildId" : "B04A54C443D36058702AB4060C63F4AB3273EAE9" }, { "b" : "7F087B54E000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "93D2E4A501823D041413EEB652B89044D1F680EE" }, { "b" : "7F0879DAC000", "path" : "/lib64/libgssapi_krb5.so.2", "elfType" : 3, "buildId" : "D46A230FFF4A7B808B3CFC213D31FCAC542FB504" }, { "b" : "7F0879AC7000", "path" : "/lib64/libkrb5.so.3", "elfType" : 3, "buildId" : "6D6136A0E795420B05854DEF13A10C226FE9CCB2" }, { "b" : "7F08798C3000", "path" : "/lib64/libcom_err.so.2", "elfType" : 3, "buildId" : "3A1166709F88740C49E060731832E3FAD2DFB66B" }, { "b" : "7F0879691000", "path" : "/lib64/libk5crypto.so.3", "elfType" : 3, "buildId" : "AA97A848DD7C9E57B06EC913E10D420AEBBCE027" }, { "b" : "7F087947B000", "path" : "/lib64/libz.so.1", "elfType" : 3, "buildId" : "1982C8CDAE90F898D1AD26DC07E807333B4789D0" }, { "b" : "7F087926C000", "path" : "/lib64/libkrb5support.so.0", "elfType" : 3, "buildId" : "AEF6C3D3C5152F339942041519A106FC055DAF71" }, { "b" : "7F0879068000", "path" : "/lib64/libkeyutils.so.1", "elfType" : 3, "buildId" : "2E01D5AC08C1280D013AAB96B292AC58BC30A263" }, { "b" : "7F0878E4E000", "path" : "/lib64/libresolv.so.2", "elfType" : 3, "buildId" : "A9DA5486FACCE0E1DED291E6B4638CCC25F77BA0" }, { "b" : "7F0878C29000", "path" : "/lib64/libselinux.so.1", "elfType" : 3, "buildId" : "82FF6B18E1E42825CC2D060F969479AD4AF2F62C" }, { "b" : "7F08789C8000", "path" : "/lib64/libpcre.so.1", "elfType" : 3, "buildId" : "30FA397B01197ECABC647CBD8E75FDD5B743D730" }, { "b" : "7F08787A3000", "path" : "/lib64/liblzma.so.5", "elfType" : 3, "buildId" : "98131C9354279ABD39FD80D4BE5B3EC5678BD9E0" } ] }}
       mongod(_ZN5mongo15printStackTraceERSo+0x32) [0x1352122]
       mongod(+0xF51279) [0x1351279]
       mongod(+0xF515F8) [0x13515f8]
       libpthread.so.0(+0xF100) [0x7f087a3c8100]
       mongod(_ZN8tcmalloc11ThreadCache21ReleaseToCentralCacheEPNS0_8FreeListEmi+0xE3) [0x143cf93]
       mongod(_ZN8tcmalloc11ThreadCache11ListTooLongEPNS0_8FreeListEm+0x1D) [0x143d06d]
       mongod(free+0x200) [0x1bb3ee0]
       mongod(+0x15FFAC9) [0x19ffac9]
       mongod(__wt_evict+0x94C) [0x1a670ec]
       mongod(+0x1661AEA) [0x1a61aea]
       mongod(+0x1662047) [0x1a62047]
       mongod(__wt_evict_thread_run+0x6A) [0x1a638ca]
       mongod(__wt_thread_run+0x16) [0x1acbf96]
       libpthread.so.0(+0x7DC5) [0x7f087a3c0dc5]
       libc.so.6(clone+0x6D) [0x7f087a0ee21d]
      -----  END BACKTRACE  -----
      2017-11-30T03:44:07.178+0000 F -        [thread1] /proc/self/maps:
      00400000-0204b000 r-xp 00000000 fd:00 1208106005                         /var/lib/mongodb-mms-automation/mongodb-linux-x86_64-3.2.16/bin/mongod
      2017-11-30T03:44:07.178+0000 F -        [thread1] 0204b000-0211f000 rw-p 01c4b000 fd:00 1208106005                         /var/lib/mongodb-mms-automation/mongodb-linux-x86_64-3.2.16/bin/mongod
      2017-11-30T03:44:07.178+0000 F -        [thread1] 0211f000-0218f000 rw-p 00000000 00:00 0 
      2017-11-30T03:44:07.178+0000 F -        [thread1] 02b4d000-0374e000 rw-p 00000000 00:00 0                                  [heap]
      2017-11-30T03:44:07.178+0000 F -        [thread1] 0374e000-4453ede000 rw-p 00000000 00:00 0                                [heap]
      2017-11-30T03:44:07.178+0000 F -        [thread1] 7f0853700000-7f0853701000 ---p 00000000 00:00 0 
      2017-11-30T03:44:07.178+0000 F -        [thread1] 7f0853701000-7f0853801000 rw-p 00000000 00:00 0 
      2017-11-30T03:44:07.178+0000 F -        [thread1] 7f0853801000-7f0853802000 ---p 00000000 00:00 0 
      2017-11-30T03:44:07.178+0000 F -        [thread1] 7f0853802000-7f0853902000 rw-p 00000000 00:00 0                          [stack:55606]
      2017-11-30T03:44:07.178+0000 F -        [thread1] 7f0853902000-7f0853903000 ---p 00000000 00:00 0 
      2017-11-30T03:44:07.178+0000 F -        [thread1] 7f0853903000-7f0853a03000 rw-p 00000000 00:00 0 
      2017-11-30T03:44:07.178+0000 F -        [thread1] 7f0853a03000-7f0853a04000 ---p 00000000 00:00 0
      ...
      ...
      

      (The "17132 SSL handshake received but server is started without SSL support" messages occur all the time in this deployment, they don't play a part in the segfault.)

      At the time of the crash the total memory usage of mongod process was ~60% of the total RAM (300GB v.s. ~500 GB). The only other user process of note running was a monitoring agent.

      From FTDC metrics (graph screenshot attached below) we see that the crash was preceeded by the resident set size decreasing rapidly plus the disk device for swap (/dev/sdb) started being written to.

      The issue is new, but not a one-off - it happened on two different shards a couple of days apart.

            Assignee:
            backlog-server-execution [DO NOT USE] Backlog - Storage Execution Team
            Reporter:
            akira.kurogane Akira Kurogane
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

              Created:
              Updated:
              Resolved: