[SERVER-18460] Segfault during eviction under load Created: 13/May/15  Updated: 09/Jul/15  Resolved: 18/May/15

Status: Closed
Project: Core Server
Component/s: Storage, WiredTiger
Affects Version/s: 3.1.2
Fix Version/s: 3.0.4, 3.1.3

Type: Bug Priority: Major - P3
Reporter: Michael Grundy Assignee: Michael Cahill (Inactive)
Resolution: Done Votes: 0
Labels: 32qa, FT, next-drop
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File crashwt.js    
Issue Links:
Depends
is depended on by WT-1933 MongoDB changes for WiredTiger 2.6.1 Closed
Related
related to SERVER-19322 Segmentation fault during replication... Closed
related to SERVER-18474 provide core dump on fassert and inva... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Completed:
Steps To Reproduce:

mongod --storageEngine=wiredTiger

run mongo-perf in loop until it crashes using the attached test case.

while [ 1 ]; do python benchrun.py  --writeCmd true -j true -f testcases/crashwt.js ; if [[ $? -gt 0 ]]; then break; fi; done

Get mongo-perf from https://github.com/mongodb/mongo-perf, put the crashwt.js in the testcases subdirectory.

Participants:

 Description   

Server segfaults during the eviction process. This stack trace is with
db version v3.1.2
git version: aa0066050f0a9db81aa47181d0fbd18c109ae991

2015-05-13T13:16:18.010-0400 F -        Got signal: 11 (Segmentation fault).
 
 0xf4dd56 0xf4d142 0xf4d47e 0x7f9c9bf53340 0x13380f1 0x1339433 0x133b283 0x130f1e6 0x130ce4d 0x130ed9e 0x7f9c9bf4b182 0x7f9c9bc7847d
----- BEGIN BACKTRACE -----
{"backtrace":[{"b":"400000","o":"B4DD56"},{"b":"400000","o":"B4D142"},{"b":"400000","o":"B4D47E"},{"b":"7F9C9BF43000","o":"10340"},{"b":"400000","o":"F380F1"},{"b":"400000","o":"F39433"},{"b":"400000","o":"F3B283"},{"b":"400000","o":"F0F1E6"},{"b":"400000","o":"F0CE4D"},{"b":"400000","o":"F0ED9E"},{"b":"7F9C9BF43000","o":"8182"},{"b":"7F9C9BB7E000","o":"FA47D"}],"processInfo":{ "mongodbVersion" : "3.1.2", "gitVersion" : "aa0066050f0a9db81aa47181d0fbd18c109ae991", "uname" : { "sysname" : "Linux", "release" : "4.0.0", "version" : "#5 SMP Fri Apr 17 10:38:38 EDT 2015", "machine" : "x86_64" }, "somap" : [ { "elfType" : 2, "b" : "400000" }, { "b" : "7FFE717FB000", "elfType" : 3 }, { "b" : "7F9C9CB85000", "path" : "/lib/x86_64-linux-gnu/librt.so.1", "elfType" : 3 }, { "b" : "7F9C9C981000", "path" : "/lib/x86_64-linux-gnu/libdl.so.2", "elfType" : 3 }, { "b" : "7F9C9C67D000", "path" : "/usr/lib/x86_64-linux-gnu/libstdc++.so.6", "elfType" : 3 }, { "b" : "7F9C9C377000", "path" : "/lib/x86_64-linux-gnu/libm.so.6", "elfType" : 3 }, { "b" : "7F9C9C161000", "path" : "/lib/x86_64-linux-gnu/libgcc_s.so.1", "elfType" : 3 }, { "b" : "7F9C9BF43000", "path" : "/lib/x86_64-linux-gnu/libpthread.so.0", "elfType" : 3 }, { "b" : "7F9C9BB7E000", "path" : "/lib/x86_64-linux-gnu/libc.so.6", "elfType" : 3 }, { "b" : "7F9C9CD8D000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3 } ] }}
 mongod-3.1.2(_ZN5mongo15printStackTraceERSo+0x26) [0xf4dd56]
 mongod-3.1.2(+0xB4D142) [0xf4d142]
 mongod-3.1.2(+0xB4D47E) [0xf4d47e]
 libpthread.so.0(+0x10340) [0x7f9c9bf53340]
 mongod-3.1.2(+0xF380F1) [0x13380f1]
 mongod-3.1.2(+0xF39433) [0x1339433]
 mongod-3.1.2(__wt_reconcile+0x1A3) [0x133b283]
 mongod-3.1.2(__wt_evict+0x116) [0x130f1e6]
 mongod-3.1.2(__wt_evict_page+0x2D) [0x130ce4d]
 mongod-3.1.2(+0xF0ED9E) [0x130ed9e]
 libpthread.so.0(+0x8182) [0x7f9c9bf4b182]
 libc.so.6(clone+0x6D) [0x7f9c9bc7847d]
-----  END BACKTRACE  -----



 Comments   
Comment by Githook User [ 10/Jun/15 ]

Author:

{u'username': u'michaelcahill', u'name': u'Michael Cahill', u'email': u'michael.cahill@mongodb.com'}

Message: Disable eviction in a tree while it is being marked dead.

refs SERVER-18460

Conflicts:
src/conn/conn_sweep.c
Branch: mongodb-3.0
https://github.com/wiredtiger/wiredtiger/commit/1d080559065fdd6533b81827e7102b8d64a1d584

Comment by Michael Cahill (Inactive) [ 26/May/15 ]

Cherry picked here: https://github.com/wiredtiger/wiredtiger/commit/a1ec3b331a83b84b716ed3477d1f1fe3e4d6fdfd

Comment by Michael Cahill (Inactive) [ 18/May/15 ]

Should be resolved by https://evergreen.mongodb.com/version/mongodb_mongo_master_4f0e70b66182cbb872c4e5eefda23f1c58bdaab7

Comment by Githook User [ 15/May/15 ]

Author:

{u'username': u'michaelcahill', u'name': u'Michael Cahill', u'email': u'michael.cahill@mongodb.com'}

Message: Disable eviction in a tree while it is being marked dead.

refs SERVER-18460
Branch: develop
https://github.com/wiredtiger/wiredtiger/commit/d0b856a419aefdf45863eb65a60095a0d3241bd1

Comment by Githook User [ 15/May/15 ]

Author:

{u'username': u'michaelcahill', u'name': u'Michael Cahill', u'email': u'michael.cahill@mongodb.com'}

Message: Never evict during an LRU walk, including when clearing the walk point.

Found via a stack trace in the referenced issue, but not the cause of the crash.

refs SERVER-18460
Branch: develop
https://github.com/wiredtiger/wiredtiger/commit/2ccc09935149f8dcadcaf266dd6c215f524766f4

Comment by Michael Grundy [ 14/May/15 ]

On the recent master version of mongod, there is additional information prior to the backtrace

2015-05-13T22:54:08.358-0400 F -        Invalid access at address: 0xc8
2015-05-13T22:54:08.361-0400 F -        Got signal: 11 (Segmentation fault).

If you look at the code at the segfault address, you'll see that bm is obviously null (bm is copied to %rdi as it's also the first arg):

    WT_RET(bm->write_size(bm, session, &corrected_page_size));
 137e036:   48 8d 54 24 18          lea    0x18(%rsp),%rdx
 137e03b:   4c 89 fe                mov    %r15,%rsi
 137e03e:   48 89 cf                mov    %rcx,%rdi
 137e041:   ff 91 c8 00 00 00       callq  *0xc8(%rcx)

Comment by Michael Grundy [ 13/May/15 ]

Also able to repro on db version: 3.1.3-pre-
9f0ceef0b37df2525cdebb172e6b05e2db8a2b20

The problem looks like it happens in __rec_split_init at mongo/src/third_party/wiredtiger/src/reconcile/rec_write.c:1635

Here's an analyzed backtrace from recent master:

0xf8d0a6:
mongo::printStackTrace(std::ostream&)
/home/grund/MongoDB/mongo/src/mongo/util/stacktrace_posix.cpp:105
0xf8c7c2:
printSignalAndBacktrace
/home/grund/MongoDB/mongo/src/mongo/util/signal_handlers_synchronous.cpp:129
0xf8cafe:
abruptQuitWithAddrSignal
/home/grund/MongoDB/mongo/src/mongo/util/signal_handlers_synchronous.cpp:244
0x7fee11256340:
??
??:0
0x137e041:
__rec_split_init
/home/grund/MongoDB/mongo/src/third_party/wiredtiger/src/reconcile/rec_write.c:1635
0x137fa03:
__rec_row_leaf
/home/grund/MongoDB/mongo/src/third_party/wiredtiger/src/reconcile/rec_write.c:4320 (discriminator 3)
0x1381949:
__wt_reconcile
/home/grund/MongoDB/mongo/src/third_party/wiredtiger/src/reconcile/rec_write.c:413
0x1354c20:
__evict_review
/home/grund/MongoDB/mongo/src/third_party/wiredtiger/src/evict/evict_page.c:356
__wt_evict
/home/grund/MongoDB/mongo/src/third_party/wiredtiger/src/evict/evict_page.c:76
0x13524d2:
__wt_evict_page
/home/grund/MongoDB/mongo/src/third_party/wiredtiger/src/evict/evict_lru.c:697
0x135445c:
__wt_page_release_evict
/home/grund/MongoDB/mongo/src/third_party/wiredtiger/src/include/btree.i:1148
__wt_page_release
/home/grund/MongoDB/mongo/src/third_party/wiredtiger/src/include/btree.i:1212
__evict_walk_file
/home/grund/MongoDB/mongo/src/third_party/wiredtiger/src/evict/evict_lru.c:1290
__evict_walk
/home/grund/MongoDB/mongo/src/third_party/wiredtiger/src/evict/evict_lru.c:1055
__evict_lru_walk
/home/grund/MongoDB/mongo/src/third_party/wiredtiger/src/evict/evict_lru.c:812
__evict_pass
/home/grund/MongoDB/mongo/src/third_party/wiredtiger/src/evict/evict_lru.c:526
__evict_server
/home/grund/MongoDB/mongo/src/third_party/wiredtiger/src/evict/evict_lru.c:169
0x7fee1124e182:
??
??:0
0x7fee10f7b47d:
??
??:0

Generated at Thu Feb 08 03:47:45 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.