[SERVER-19989] Segfault in evict_lru.c Created: 17/Aug/15  Updated: 09/Sep/15  Resolved: 28/Aug/15

Status: Closed
Project: Core Server
Component/s: WiredTiger
Affects Version/s: None
Fix Version/s: 3.0.6

Type: Bug Priority: Critical - P2
Reporter: Ramon Fernandez Marina Assignee: Michael Cahill (Inactive)
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
is depended on by WT-1973 MongoDB changes for WiredTiger 2.7.0 Closed
Related
Backwards Compatibility: Fully Compatible
Operating System: ALL
Participants:

 Description   

2015-08-17T03:25:32.776+0000 I NETWORK  [initandlisten] connection accepted from 10.20.0.164:45684 #56812 (20 connections now open)
2015-08-17T03:25:40.707+0000 I NETWORK  [conn56799] end connection 10.200.1.61:38502 (19 connections now open)
2015-08-17T03:25:40.741+0000 I NETWORK  [initandlisten] connection accepted from 10.200.1.61:38525 #56813 (20 connections now open)
2015-08-17T03:25:45.821+0000 I NETWORK  [conn56800] end connection 10.20.0.170:37363 (19 connections now open)
2015-08-17T03:25:45.824+0000 I NETWORK  [initandlisten] connection accepted from 10.20.0.170:37389 #56814 (20 connections now open)
2015-08-17T03:25:58.477+0000 I NETWORK  [conn56802] end connection 10.10.0.169:45149 (19 connections now open)
2015-08-17T03:25:58.478+0000 I NETWORK  [initandlisten] connection accepted from 10.10.0.169:45598 #56815 (20 connections now open)
2015-08-17T03:26:02.793+0000 I NETWORK  [conn56812] end connection 10.20.0.164:45684 (19 connections now open)
2015-08-17T03:26:02.794+0000 I NETWORK  [initandlisten] connection accepted from 10.20.0.164:45913 #56816 (20 connections now open)
2015-08-17T03:26:10.901+0000 I NETWORK  [conn56813] end connection 10.200.1.61:38525 (19 connections now open)
2015-08-17T03:26:10.912+0000 I NETWORK  [initandlisten] connection accepted from 10.200.1.61:38551 #56817 (20 connections now open)
2015-08-17T03:26:12.839+0000 F -        Invalid access at address: 0
2015-08-17T03:26:12.848+0000 F -        Got signal: 11 (Segmentation fault).
 
 0xf75569 0xf74be2 0xf74f3e 0x369ce0f710 0x136104c 0x369ce079d1 0x369cae88fd
----- BEGIN BACKTRACE -----
{"backtrace":[{"b":"400000","o":"B75569"},{"b":"400000","o":"B74BE2"},{"b":"400000","o":"B74F3E"},{"b":"369CE00000","o":"F710"},{"b":"400000","o":"F6104C"},{"b":"369CE00000","o":"79D1"},{"b":"369CA00000","o":"E88FD"}],"processInfo":{ "mongodbVersion" : "3.0.6-rc1-pre-", "gitVersion" : "7397477c45986e2a9ed195f912afee3ed9ffcbc7", "uname" : { "sysname" : "Linux", "release" : "2.6.32-504.23.4.el6.x86_64", "version" : "#1 SMP Tue Jun 9 20:57:37 UTC 2015", "machine" : "x86_64" }, "somap" : [ { "elfType" : 2, "b" : "400000", "buildId" : "77066BF13744F5BE02D2FD9E7C3776B7508C72BF" }, { "b" : "7FFFC42F3000", "elfType" : 3, "buildId" : "FF4CBAAE51A93124ED31C2B1386CE92FF24AEBC3" }, { "path" : "/lib64/libpthread.so.0", "elfType" : 3, "buildId" : "B8DFF8E53D9F2B80C3C382E83EC17C828B536A39" }, { "path" : "/usr/lib64/libssl.so.10", "elfType" : 3, "buildId" : "40BEA6554E64FC0C3D5C7D0CD91362730515102F" }, { "path" : "/usr/lib64/libcrypto.so.10", "elfType" : 3, "buildId" : "FC4EFD7502ACB3B9D213D28272D15A165857AD5A" }, { "path" : "/lib64/librt.so.1", "elfType" : 3, "buildId" : "583411D8786F86A1D6B8741C502831E6122445A7" }, { "path" : "/lib64/libdl.so.2", "elfType" : 3, "buildId" : "454F8FC6CC6502C6401E5F9E221564D80665D277" }, { "path" : "/usr/lib64/libstdc++.so.6", "elfType" : 3, "buildId" : "F07F2E7CF4BFB393CC9BBE8CDC6463652E14DB07" }, { "path" : "/lib64/libm.so.6", "elfType" : 3, "buildId" : "C9A87F6A29ED1D3CB18F539845A45FE3A9877FF1" }, { "path" : "/lib64/libgcc_s.so.1", "elfType" : 3, "buildId" : "246C3BAB0AB093AFD59D34C8CBF29E786DE4BE97" }, { "path" : "/lib64/libc.so.6", "elfType" : 3, "buildId" : "1425CB3B4C2F49C8101ED9B8F1D289053B4DFA77" }, { "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "6F8E59B70E469F3A924A268911FF8FD0C37E7460" }, { "path" : "/lib64/libgssapi_krb5.so.2", "elfType" : 3, "buildId" : "54BA6B78A9220344E77463947215E42F0EABCC62" }, { "path" : "/lib64/libkrb5.so.3", "elfType" : 3, "buildId" : "6797403AA5F8FAD8ADFF683478B45F528CE4FB0E" }, { "path" : "/lib64/libcom_err.so.2", "elfType" : 3, "buildId" : "8CE28F280150E62296240E70ECAC64E4A57AB826" }, { "path" : "/lib64/libk5crypto.so.3", "elfType" : 3, "buildId" : "05733977F4E41652B86070B27A0CFC2C1EA7719D" }, { "path" : "/lib64/libz.so.1", "elfType" : 3, "buildId" : "5FA8E5038EC04A774AF72A9BB62DC86E1049C4D6" }, { "path" : "/lib64/libkrb5support.so.0", "elfType" : 3, "buildId" : "E3FA235F3BA3F776A01A18ECA737C9890F445923" }, { "path" : "/lib64/libkeyutils.so.1", "elfType" : 3, "buildId" : "AF374BAFB7F5B139A0B431D3F06D82014AFF3251" }, { "path" : "/lib64/libresolv.so.2", "elfType" : 3, "buildId" : "58B696478044E028A5970D48A4ED50E164B43B36" }, { "path" : "/lib64/libselinux.so.1", "elfType" : 3, "buildId" : "E6798A06BEE17CF102BBA44FD512FF8B805CEAF1" } ] }}
 mongod(_ZN5mongo15printStackTraceERSo+0x29) [0xf75569]
 mongod(+0xB74BE2) [0xf74be2]
 mongod(+0xB74F3E) [0xf74f3e]
 libpthread.so.0(+0xF710) [0x369ce0f710]
 mongod(+0xF6104C) [0x136104c]
 libpthread.so.0(+0x79D1) [0x369ce079d1]
 libc.so.6(clone+0x6D) [0x369cae88fd]
-----  END BACKTRACE  -----



 Comments   
Comment by Michael Cahill (Inactive) [ 19/Aug/15 ]

Update: this could have been caused by not backporting https://github.com/wiredtiger/wiredtiger/commit/393344d5d4b436fe3519cb8ab541bab22663553d to 3.0. I have corrected that now by backporting it to the WT mongodb-3.0 branch ready for a new drop into MongoDB 3.0.

I will try to trigger that condition in RC1 to verify that this change fixes it.

Comment by Michael Cahill (Inactive) [ 19/Aug/15 ]

This issue has been hit again in testing 3.0.6-rc1. That version doesn't include the write barrier change here, but we have no reason to believe that fixes it. We need a repro.

Comment by Githook User [ 19/Aug/15 ]

Author:

{u'username': u'michaelcahill', u'name': u'Michael Cahill', u'email': u'michael.cahill@mongodb.com'}

Message: Merge pull request #2124 from wiredtiger/dhandle-barrier

SERVER-19989 Add a write barrier before handles become public
Branch: develop
https://github.com/wiredtiger/wiredtiger/commit/b1850e8108e9c7a815f2c8dbda52b1fa60d93f82

Comment by Githook User [ 19/Aug/15 ]

Author:

{u'username': u'michaelcahill', u'name': u'Michael Cahill', u'email': u'michael.cahill@mongodb.com'}

Message: Merge pull request #2126 from wiredtiger/dhandle-barrier-all

SERVER-19989 Add a write barrier before data handles are added to shared lists
Branch: develop
https://github.com/wiredtiger/wiredtiger/commit/65abd20a5ec285483b43c16deed4b6e6561af71f

Comment by Githook User [ 19/Aug/15 ]

Author:

{u'username': u'michaelcahill', u'name': u'Michael Cahill', u'email': u'michael.cahill@mongodb.com'}

Message: SERVER-19989 There are readers of the handle list that don't acquire the mutex: add a write barrier before handles are inserted.
Branch: develop
https://github.com/wiredtiger/wiredtiger/commit/85483b97b9517c552667e232e08396755a1e8d04

Comment by Keith Bostic (Inactive) [ 18/Aug/15 ]

michael.cahill, I've looked and I don't see anything else in/around this path.

Comment by Michael Cahill (Inactive) [ 18/Aug/15 ]

I've been running workloads to keep eviction busy while constantly creating and discarding handles for hours today without triggering this.

Given the timing, my assumption was that it was introduced by WT-2038, but AFAICT, we are holding the handle list mutex at the point where this crash happened, and we are also holding that mutex at the point where we remove a handle from the list and free its name.

The only other thought I've had is that we might somehow be seeing a new handle before the name field has been set (which would make more sense in the sweep server than here), so I've opened https://github.com/wiredtiger/wiredtiger/pull/2124 to add memory barriers before handles are put in shared lists.

keith.bostic, any other ideas?

Generated at Thu Feb 08 03:52:47 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.