[SERVER-32170] tcmalloc segfault at time of first swap usage Created: 05/Dec/17  Updated: 06/Dec/22  Resolved: 15/Dec/17

Status: Closed
Project: Core Server
Component/s: Stability
Affects Version/s: 3.2.16
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Akira Kurogane Assignee: Backlog - Storage Execution Team
Resolution: Cannot Reproduce Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File smppsymmdb03.ftdc.20171130_0200_to_0344.png    
Issue Links:
Related
is related to SERVER-31284 Server crash - Need help Closed
Assigned Teams:
Storage Execution
Operating System: ALL
Participants:
Case:

 Description   

Whilst a shard mongod node was running it suddenly crashed with the following "Invalid access at address: 0" message in the log:

2017-11-30T03:44:07.078+0000 I NETWORK  [initandlisten] connection accepted from 127.0.0.1:57502 #2568005 (319 connections now open)
2017-11-30T03:44:07.078+0000 I NETWORK  [conn2568005] AssertionException handling request, closing client connection: 17132 SSL handshake received but server is started without SSL support
2017-11-30T03:44:07.117+0000 F -        [thread1] Invalid access at address: 0
2017-11-30T03:44:07.175+0000 F -        [thread1] Got signal: 11 (Segmentation fault).
 
 0x1352122 0x1351279 0x13515f8 0x7f087a3c8100 0x143cf93 0x143d06d 0x1bb3ee0 0x19ffac9 0x1a670ec 0x1a61aea 0x1a62047 0x1a638ca 0x1acbf96 0x7f087a3c0dc5 0x7f087a0ee21d
----- BEGIN BACKTRACE -----
{"backtrace":[{"b":"400000","o":"F52122","s":"_ZN5mongo15printStackTraceERSo"},{"b":"400000","o":"F51279"},{"b":"400000","o":"F515F8"},{"b":"7F087A3B9000","o":"F100"},{"b":"400000","o":"103CF93","s":"_ZN8tcmalloc11ThreadCache21ReleaseToCentralCacheEPNS0_8FreeListEmi"},{"b":"400000","o":"103D06D","s":"_ZN8tcmalloc11ThreadCache11ListTooLongEPNS0_8FreeListEm"},{"b":"400000","o":"17B3EE0","s":"free"},{"b":"400000","o":"15FFAC9"},{"b":"400000","o":"16670EC","s":"__wt_evict"},{"b":"400000","o":"1661AEA"},{"b":"400000","o":"1662047"},{"b":"400000","o":"16638CA","s":"__wt_evict_thread_run"},{"b":"400000","o":"16CBF96","s":"__wt_thread_run"},{"b":"7F087A3B9000","o":"7DC5"},{"b":"7F0879FF8000","o":"F621D","s":"clone"}],"processInfo":{ "mongodbVersion" : "3.2.16", "gitVersion" : "056bf45128114e44c5358c7a8776fb582363e094", "compiledModules" : [], "uname" : { "sysname" : "Linux", "release" : "3.10.0-327.el7.x86_64", "version" : "#1 SMP Thu Nov 19 22:10:57 UTC 2015", "machine" : "x86_64" }, "somap" : [ { "elfType" : 2, "b" : "400000", "buildId" : "46D1E684CF5C5862766541145ADE571355BC0F10" }, { "b" : "7FFC90AEE000", "elfType" : 3, "buildId" : "0F488C85309E4AB293D3B758748A3037B01F9885" }, { "b" : "7F087B2E1000", "path" : "/lib64/libssl.so.10", "elfType" : 3, "buildId" : "478D01A08B923A251D755BB421F3EBAF9F2982C1" }, { "b" : "7F087AEF9000", "path" : "/lib64/libcrypto.so.10", "elfType" : 3, "buildId" : "42AAFD25E9B5F4CE2EFE6309491445B1A92A575D" }, { "b" : "7F087ACF1000", "path" : "/lib64/librt.so.1", "elfType" : 3, "buildId" : "4D22B5631EAF7D1EB6E72A42315105D14C70DC50" }, { "b" : "7F087AAED000", "path" : "/lib64/libdl.so.2", "elfType" : 3, "buildId" : "091060A163E7EDA25572F3B1BAF2E8F80209C00E" }, { "b" : "7F087A7EB000", "path" : "/lib64/libm.so.6", "elfType" : 3, "buildId" : "8DEE792EF5B31B20A3CD95985E8BE374D5C6A8C2" }, { "b" : "7F087A5D5000", "path" : "/lib64/libgcc_s.so.1", "elfType" : 3, "buildId" : "6AA1DCC4DE7F1836344949857FC2017278631FFD" }, { "b" : "7F087A3B9000", "path" : "/lib64/libpthread.so.0", "elfType" : 3, "buildId" : "4C0F15C537CF112511B6BE97DE1627C350652AA2" }, { "b" : "7F0879FF8000", "path" : "/lib64/libc.so.6", "elfType" : 3, "buildId" : "B04A54C443D36058702AB4060C63F4AB3273EAE9" }, { "b" : "7F087B54E000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "93D2E4A501823D041413EEB652B89044D1F680EE" }, { "b" : "7F0879DAC000", "path" : "/lib64/libgssapi_krb5.so.2", "elfType" : 3, "buildId" : "D46A230FFF4A7B808B3CFC213D31FCAC542FB504" }, { "b" : "7F0879AC7000", "path" : "/lib64/libkrb5.so.3", "elfType" : 3, "buildId" : "6D6136A0E795420B05854DEF13A10C226FE9CCB2" }, { "b" : "7F08798C3000", "path" : "/lib64/libcom_err.so.2", "elfType" : 3, "buildId" : "3A1166709F88740C49E060731832E3FAD2DFB66B" }, { "b" : "7F0879691000", "path" : "/lib64/libk5crypto.so.3", "elfType" : 3, "buildId" : "AA97A848DD7C9E57B06EC913E10D420AEBBCE027" }, { "b" : "7F087947B000", "path" : "/lib64/libz.so.1", "elfType" : 3, "buildId" : "1982C8CDAE90F898D1AD26DC07E807333B4789D0" }, { "b" : "7F087926C000", "path" : "/lib64/libkrb5support.so.0", "elfType" : 3, "buildId" : "AEF6C3D3C5152F339942041519A106FC055DAF71" }, { "b" : "7F0879068000", "path" : "/lib64/libkeyutils.so.1", "elfType" : 3, "buildId" : "2E01D5AC08C1280D013AAB96B292AC58BC30A263" }, { "b" : "7F0878E4E000", "path" : "/lib64/libresolv.so.2", "elfType" : 3, "buildId" : "A9DA5486FACCE0E1DED291E6B4638CCC25F77BA0" }, { "b" : "7F0878C29000", "path" : "/lib64/libselinux.so.1", "elfType" : 3, "buildId" : "82FF6B18E1E42825CC2D060F969479AD4AF2F62C" }, { "b" : "7F08789C8000", "path" : "/lib64/libpcre.so.1", "elfType" : 3, "buildId" : "30FA397B01197ECABC647CBD8E75FDD5B743D730" }, { "b" : "7F08787A3000", "path" : "/lib64/liblzma.so.5", "elfType" : 3, "buildId" : "98131C9354279ABD39FD80D4BE5B3EC5678BD9E0" } ] }}
 mongod(_ZN5mongo15printStackTraceERSo+0x32) [0x1352122]
 mongod(+0xF51279) [0x1351279]
 mongod(+0xF515F8) [0x13515f8]
 libpthread.so.0(+0xF100) [0x7f087a3c8100]
 mongod(_ZN8tcmalloc11ThreadCache21ReleaseToCentralCacheEPNS0_8FreeListEmi+0xE3) [0x143cf93]
 mongod(_ZN8tcmalloc11ThreadCache11ListTooLongEPNS0_8FreeListEm+0x1D) [0x143d06d]
 mongod(free+0x200) [0x1bb3ee0]
 mongod(+0x15FFAC9) [0x19ffac9]
 mongod(__wt_evict+0x94C) [0x1a670ec]
 mongod(+0x1661AEA) [0x1a61aea]
 mongod(+0x1662047) [0x1a62047]
 mongod(__wt_evict_thread_run+0x6A) [0x1a638ca]
 mongod(__wt_thread_run+0x16) [0x1acbf96]
 libpthread.so.0(+0x7DC5) [0x7f087a3c0dc5]
 libc.so.6(clone+0x6D) [0x7f087a0ee21d]
-----  END BACKTRACE  -----
2017-11-30T03:44:07.178+0000 F -        [thread1] /proc/self/maps:
00400000-0204b000 r-xp 00000000 fd:00 1208106005                         /var/lib/mongodb-mms-automation/mongodb-linux-x86_64-3.2.16/bin/mongod
2017-11-30T03:44:07.178+0000 F -        [thread1] 0204b000-0211f000 rw-p 01c4b000 fd:00 1208106005                         /var/lib/mongodb-mms-automation/mongodb-linux-x86_64-3.2.16/bin/mongod
2017-11-30T03:44:07.178+0000 F -        [thread1] 0211f000-0218f000 rw-p 00000000 00:00 0 
2017-11-30T03:44:07.178+0000 F -        [thread1] 02b4d000-0374e000 rw-p 00000000 00:00 0                                  [heap]
2017-11-30T03:44:07.178+0000 F -        [thread1] 0374e000-4453ede000 rw-p 00000000 00:00 0                                [heap]
2017-11-30T03:44:07.178+0000 F -        [thread1] 7f0853700000-7f0853701000 ---p 00000000 00:00 0 
2017-11-30T03:44:07.178+0000 F -        [thread1] 7f0853701000-7f0853801000 rw-p 00000000 00:00 0 
2017-11-30T03:44:07.178+0000 F -        [thread1] 7f0853801000-7f0853802000 ---p 00000000 00:00 0 
2017-11-30T03:44:07.178+0000 F -        [thread1] 7f0853802000-7f0853902000 rw-p 00000000 00:00 0                          [stack:55606]
2017-11-30T03:44:07.178+0000 F -        [thread1] 7f0853902000-7f0853903000 ---p 00000000 00:00 0 
2017-11-30T03:44:07.178+0000 F -        [thread1] 7f0853903000-7f0853a03000 rw-p 00000000 00:00 0 
2017-11-30T03:44:07.178+0000 F -        [thread1] 7f0853a03000-7f0853a04000 ---p 00000000 00:00 0
...
...

(The "17132 SSL handshake received but server is started without SSL support" messages occur all the time in this deployment, they don't play a part in the segfault.)

At the time of the crash the total memory usage of mongod process was ~60% of the total RAM (300GB v.s. ~500 GB). The only other user process of note running was a monitoring agent.

From FTDC metrics (graph screenshot attached below) we see that the crash was preceeded by the resident set size decreasing rapidly plus the disk device for swap (/dev/sdb) started being written to.

The issue is new, but not a one-off - it happened on two different shards a couple of days apart.



 Comments   
Comment by Akira Kurogane [ 17/Dec/17 ]

Yes, let's close for now, hopefully a core dump will be shared from somewhere in the future.

I want to add that I rule out that the zero pointer might have been returned from a malloc() now. Even though the segfault was happening underneath tcmalloc::ThreadCache::Deallocate() I considered it possible that used malloc() for somewhere and that returned a zero pointer (with a non-zero errno value) due to the memory pressure. Also there was a separate user's case that had tcmalloc::ThreadCache::Allocate() once instead of Deallocate().

But underneath both of these functions it was in either tcmalloc::SLL_PopRange() or tcmalloc::SLL_Pop() the segfault is recorded as occuring. These functions were short and simple - and they have no malloc() calls in them.

So it seems it is a double-free bug, presumably reached in a race condition somewhere in or beneath __evict_page_dirty_update().

Comment by Ian Whalen (Inactive) [ 15/Dec/17 ]

Hey Akira, thanks a lot for continuing to dig on this. It sounds like without a repro there's not much the Storage team is able to do. We're closing this for now, but please do reopen if you're able to repro and/or dig up more info.

Comment by Akira Kurogane [ 13/Dec/17 ]

I tried the repro again, without luck.

This time I had less-compressible data so storage size of the collections was large enough for the kernel to use all of remaining RAM for filecache for certain.
I confirmed the NUMA and THP settings were wrong, because sometimes that 'helpfully' creates stalls.
In normal running 480GB of ram was all used by process resident memory or filecache except for 10 ~ 12GB that remained free.

Then I performed a core dump of a dummy user process that created a 60G file.

But the kernel handled it smoothly - no swap was used at all. Presumably 50 +GB of filecache was released very quickly.

Comment by Akira Kurogane [ 12/Dec/17 ]

I've been doing some nice big heavy reproductions but haven't been able to put the kernel memory management into much congestion. It keeps on releasing memory at a steady rate and the free amount never gets near zero.

Today a user reported that they disabled core dumps on another user process (which was observed to be having aborts several times a day) and they've not had any more crashes of mongod since. This is interesting because a core dump of a process using memory larger than (or at least the general size of) the OS's free amount can explain why the mongod would start being (partially) pushed out RAM in a hurry.

Comment by Ian Whalen (Inactive) [ 10/Dec/17 ]

Arg, sorry, that was stupid - I mis-read Akira's comment as 'Andrew Morrow added a comment'. Leaving this on Storage, presumably for alexander.gorrod to triage since it's inside of WT.

Comment by Ian Whalen (Inactive) [ 08/Dec/17 ]

Assigning to Platforms since Drew is stewing on this.

Comment by Akira Kurogane [ 05/Dec/17 ]

Re-symbolized stack trace (leading directory paths trimmed for readability).

.../src/mongo/util/stacktrace_posix.cpp:171:0: mongo::printStackTrace(std::ostream&)
.../src/mongo/util/signal_handlers_synchronous.cpp:180:0: mongo::(anonymous namespace)::printSignalAndBacktrace(int)
.../src/mongo/util/signal_handlers_synchronous.cpp:276:0: mongo::(anonymous namespace)::abruptQuitWithAddrSignal(int, siginfo_t*, void*)
??:0:0: where_is_shmfs
.../src/third_party/gperftools-2.2/src/linked_list.h:75:0: SLL_PopRange
.../src/third_party/gperftools-2.2/src/thread_cache.h:229:0: tcmalloc::ThreadCache::FreeList::PopRange(int, void**, void**)
.../src/third_party/gperftools-2.2/src/thread_cache.cc:235:0: tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*, unsigned long, int)
.../src/third_party/gperftools-2.2/src/thread_cache.cc:197:0: tcmalloc::ThreadCache::ListTooLong(tcmalloc::ThreadCache::FreeList*, unsigned long)
.../src/third_party/gperftools-2.2/src/thread_cache.h:390:0: tcmalloc::ThreadCache::Deallocate(void*, unsigned long)
.../src/third_party/gperftools-2.2/src/tcmalloc.cc:1227:0: do_free_helper
.../src/third_party/gperftools-2.2/src/tcmalloc.cc:1257:0: do_free_with_callback
.../src/third_party/gperftools-2.2/src/tcmalloc.cc:1266:0: do_free
.../src/third_party/gperftools-2.2/src/tcmalloc.cc:1617:0: free
.../src/third_party/wiredtiger/src/btree/bt_discard.c:448:0: __free_update
.../src/third_party/wiredtiger/src/btree/bt_discard.c:245:0: __free_page_modify
.../src/third_party/wiredtiger/src/btree/bt_discard.c:136:0: __page_out_int
.../src/third_party/wiredtiger/src/evict/evict_page.c:296:0: __evict_page_dirty_update
.../src/third_party/wiredtiger/src/evict/evict_page.c:167:0: __wt_evict
.../src/third_party/wiredtiger/src/evict/evict_lru.c:2135:0: __evict_page
.../src/third_party/wiredtiger/src/evict/evict_lru.c:1140:0: __evict_lru_pages
.../src/third_party/wiredtiger/src/evict/evict_lru.c:335:0: __wt_evict_thread_run
.../src/third_party/wiredtiger/src/support/thread_group.c:25:0: __wt_thread_run
??:0:0: __pthread_create_2_1
??:0:0: __libc_ifunc_impl_list

Comment by Akira Kurogane [ 05/Dec/17 ]

Graph from FTDC metrics. The last second of the captured metrics shows the first large decrease in resident set size, plus that the swap disk device is being used for the first time.

Generated at Thu Feb 08 04:29:24 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.