[SERVER-18758] Segmentation fault on heavy unacknowledged bulk inserts with WiredTiger Created: 31/May/15  Updated: 04/Aug/15  Resolved: 03/Aug/15

Status: Closed
Project: Core Server
Component/s: WiredTiger
Affects Version/s: 3.0.3
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Yan Assignee: Sam Kleinman (Inactive)
Resolution: Incomplete Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

40 cores, 120g ram


Attachments: HTML File mongod    
Issue Links:
Related
is related to SERVER-17673 Segfault in 3.0.1 Insert Only Workload Closed
Operating System: Linux
Steps To Reproduce:

Create database, begin to emit heavy unacknowledged bulk writes and updates.

Participants:

 Description   

Got segmentation fault while heavy multithreaded unacknowledged bulk writes to wiredTiger database.

Acknowledges writes goes clear.

2015-05-31T17:17:12.649+0300 I WRITE    [conn391] insert sm.product ninserted:1 keyUpdates:0 writeConflicts:0 numYields:0 locks:{ Global: { acquireCount: { w: 1 } }, Database: { acquireCount: { w: 1 } }, Collection: { acquireCount: { w: 1 } } } 174ms
2015-05-31T17:17:13.068+0300 F -        Invalid access at address: 0xc8
2015-05-31T17:17:13.091+0300 F -        Got signal: 11 (Segmentation fault).
 
 0xf6a889 0xf69f02 0xf6a25e 0x7f16a67c0710 0x137fad1 0x13810e3 0x1383215 0x1354424 0x1351e2d 0x1353f3c 0x7f16a67b89d1 0x7f16a530e8fd
----- BEGIN BACKTRACE -----
{"backtrace":[{"b":"400000","o":"B6A889"},{"b":"400000","o":"B69F02"},{"b":"400000","o":"B6A25E"},{"b":"7F16A67B1000","o":"F710"},{"b":"400000","o":"F7FAD1"},{"b":"400000","o":"F810E3"},{"b":"400000","o":"F83215"},{"b":"400000","o":"F54424"},{"b":"400000","o":"F51E2D"},{"b":"400000","o":"F53F3C"},{"b":"7F16A67B1000","o":"79D1"},{"b":"7F16A5226000","o":"E88FD"}],"processInfo":{ "mongodbVersion" : "3.0.3", "gitVersion" : "b40106b36eecd1b4407eb1ad1af6bc60593c6105", "uname" : { "sysname" : "Linux", "release" : "2.6.32-431.5.1.el6.x86_64", "version" : "#1 SMP Wed Feb 12 00:41:43 UTC 2014", "machine" : "x86_64" }, "somap" : [ { "elfType" : 2, "b" : "400000", "buildId" : "108A63CA14A4BD5E599BAC10885DBD3A85DA5439" }, { "b" : "7FFF924FF000", "elfType" : 3, "buildId" : "F795EFBE6950D1523C5748594C166CEDD4254C33" }, { "b" : "7F16A67B1000", "path" : "/lib64/libpthread.so.0", "elfType" : 3, "buildId" : "B8DFF8E53D9F2B80C3C382E83EC17C828B536A39" }, { "b" : "7F16A6546000", "path" : "/usr/lib64/libssl.so.10", "elfType" : 3, "buildId" : "DB1785F0187DE5A9307FC2A79E8B1BE953C5562C" }, { "b" : "7F16A6166000", "path" : "/usr/lib64/libcrypto.so.10", "elfType" : 3, "buildId" : "58F274C5E71DE44F93ACA77E8F959601545A053C" }, { "b" : "7F16A5F5E000", "path" : "/lib64/librt.so.1", "elfType" : 3, "buildId" : "583411D8786F86A1D6B8741C502831E6122445A7" }, { "b" : "7F16A5D5A000", "path" : "/lib64/libdl.so.2", "elfType" : 3, "buildId" : "454F8FC6CC6502C6401E5F9E221564D80665D277" }, { "b" : "7F16A5A54000", "path" : "/usr/lib64/libstdc++.so.6", "elfType" : 3, "buildId" : "ED99110E629209C5CA6C0ED704F2C5CE3171513A" }, { "b" : "7F16A57D0000", "path" : "/lib64/libm.so.6", "elfType" : 3, "buildId" : "7D8E9374F4A4EA38A7C1E763F32240EA113E4208" }, { "b" : "7F16A55BA000", "path" : "/lib64/libgcc_s.so.1", "elfType" : 3, "buildId" : "A44499D29B114A5366CD72DD4883958495AC1C1D" }, { "b" : "7F16A5226000", "path" : "/lib64/libc.so.6", "elfType" : 3, "buildId" : "E4EAB3C200B7D8444FF95AB01F6466924A6A5F5F" }, { "b" : "7F16A69CE000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "6F8E59B70E469F3A924A268911FF8FD0C37E7460" }, { "b" : "7F16A4FE2000", "path" : "/lib64/libgssapi_krb5.so.2", "elfType" : 3, "buildId" : "EFF68B7DE77D081BC4A0CB38FE9DCBC60541BF92" }, { "b" : "7F16A4CFC000", "path" : "/lib64/libkrb5.so.3", "elfType" : 3, "buildId" : "95EBB74C2C0A1E1714344036145A0239FFA4892D" }, { "b" : "7F16A4AF8000", "path" : "/lib64/libcom_err.so.2", "elfType" : 3, "buildId" : "6A22EDFF4D4F04A57573E3D1536B6B4963159CD5" }, { "b" : "7F16A48CC000", "path" : "/lib64/libk5crypto.so.3", "elfType" : 3, "buildId" : "D02E7D3149950118009A81997434E28B7D9EC9B2" }, { "b" : "7F16A46B6000", "path" : "/lib64/libz.so.1", "elfType" : 3, "buildId" : "5FA8E5038EC04A774AF72A9BB62DC86E1049C4D6" }, { "b" : "7F16A44AB000", "path" : "/lib64/libkrb5support.so.0", "elfType" : 3, "buildId" : "5AFCBEA0D62EE0335714CCBAB7BA808E2A16028C" }, { "b" : "7F16A42A8000", "path" : "/lib64/libkeyutils.so.1", "elfType" : 3, "buildId" : "8A8734DC37305D8CC2EF8F8C3E5EA03171DB07EC" }, { "b" : "7F16A408E000", "path" : "/lib64/libresolv.so.2", "elfType" : 3, "buildId" : "F8B68F301C19BF06AF56B4B06E0A69F89D2C1F8D" }, { "b" : "7F16A3E6F000", "path" : "/lib64/libselinux.so.1", "elfType" : 3, "buildId" : "BAD5C71361DADF259B6E306A49E6F47F24AEA3DC" } ] }}
 mongod(_ZN5mongo15printStackTraceERSo+0x29) [0xf6a889]
 mongod(+0xB69F02) [0xf69f02]
 mongod(+0xB6A25E) [0xf6a25e]
 libpthread.so.0(+0xF710) [0x7f16a67c0710]
 mongod(+0xF7FAD1) [0x137fad1]
 mongod(+0xF810E3) [0x13810e3]
 mongod(__wt_reconcile+0x1B5) [0x1383215]
 mongod(__wt_evict+0x104) [0x1354424]
 mongod(__wt_evict_page+0x2D) [0x1351e2d]
 mongod(+0xF53F3C) [0x1353f3c]
 libpthread.so.0(+0x79D1) [0x7f16a67b89d1]
 libc.so.6(clone+0x6D) [0x7f16a530e8fd]
-----  END BACKTRACE  -----



 Comments   
Comment by Sam Kleinman (Inactive) [ 03/Aug/15 ]

I'm going to go ahead and close this ticket, because you mentioned the issue seems to have resolved itself and there haven't been any updates in a while. If you encounter this isue, feel free to update this ticket and we can continue the exploration of this issue.

Regards,
sam

Comment by Sam Kleinman (Inactive) [ 02/Jul/15 ]

Thanks your feedback. I'm glad to hear that things are working better for you right now. I'm going to set this ticket back to the waiting state. If you run into this report, we look forward to getting the output with the additional debug symbol data.

Regards,
sam

Comment by Yan [ 02/Jul/15 ]

Ok, i will switch, but after 3.0.4 things are running smooth for now.

Comment by Sam Kleinman (Inactive) [ 02/Jul/15 ]

Could you attempt to reproduce this issue using binaries with debug symbols? You can find these binaries for your linux release at https://www.mongodb.org/dl/linux.

Thanks so much.

Regards,
sam

Comment by Ramon Fernandez Marina [ 01/Jul/15 ]

Hi lightket, apologies for the long delay, and thanks for uploading the binary you're using. Unfortunately this binary does not include debugging information, but it should help us track the debugging information on our end. Further attempts to reproduce have been unsuccessful, so we continue to investigate this issue – thank you for your patience.

Regards,
Ramón.

Comment by Yan [ 11/Jun/15 ]

Ramon, attaching binary file

I'm sorry, but i cannot give you our code at this point. And it will be irrelevant anyway without source database (we developing migration tool).

Lets just hope someone else comes up with this bug ang will be able to provide it.

Comment by Ramon Fernandez Marina [ 09/Jun/15 ]

lightket, I tried reproducing this on my end using the code below, with 100 threads doing both unordered and ordered bulk inserts of different sizes, but unfortunately I wasn't able to trigger the segfault you're observing. Therefore I'd like to emphasize the need for more details, and preferably a reproducer so we can find the root cause of this bug.

Thanks in advance for your help,
Ramón.

load('jstests/libs/parallelTester.js');
 
function main(threadId, numDocs, flag) {
    for(c=0; c<100; c++) {
        if (flag == 0) {
            var bulk = db.foo.initializeOrderedBulkOp();
        } else {
            var bulk = db.foo.initializeUnorderedBulkOp();
        }
 
        for (var i = 0; i < numDocs * c; ++i) {
            bulk.insert({t: threadId, iter: c, x: i});
        }
        var res = bulk.execute( { w: 0 });
    }
}
 
// Start from a clean collection
db.foo.drop()
 
threads = []
var numThreads = 100;
for (var i = 0; i < numThreads; i++) {
    var t = new ScopedThread(main, i, 1000, i % 2);
    threads.push(t)
    t.start();
}
 
for (var i in threads) {
    var t = threads[i]; t.join();
}

Comment by Ramon Fernandez Marina [ 08/Jun/15 ]

Hi lightket,

unfortunately the link you provided does not allow me to retrieve the exact binary you're using, which I need to decipher the stack trace. But knowing the version of RedHat you're using should help – what's the output of running

cat /etc/*-release

?

Note that this may not be sufficient to find the root cause of this issue, and that ideally we'd like to reproduce this ourselves. Can you please provide more information on how you're triggering this? Is this restoring a large dataset or are you generating data? How large are these bulk inserts? Are you using YCSB or perhaps a program of your own to do these bulk inserts? If the latter it would be of great help if you could share that with us as requested above.

Thanks,
Ramón.

Comment by Yan [ 08/Jun/15 ]

Ramon, what will be our next action?

Comment by Yan [ 08/Jun/15 ]

Fault again:

2015-06-08T12:12:43.107+0300 I COMMAND  [conn2427] command sm2.$cmd command: insert { insert: "account", ordered: true, documents: 1000 } keyUpdates:0 writeConflicts:0 numYields:0 reslen:40 locks:{ Global: { acquireCo
unt: { w: 9 } }, Database: { acquireCount: { w: 9 } }, Collection: { acquireCount: { w: 9 } } } 165ms
2015-06-08T12:12:43.112+0300 F -        Got signal: 11 (Segmentation fault).
 
 0xf6a889 0xf69f02 0xf6a25e 0x7f46eed3e710 0x137fad1 0x13810e3 0x1383215 0x1354424 0x1351e2d 0x1353f3c 0x7f46eed369d1 0x7f46ed88c8fd
----- BEGIN BACKTRACE -----
{"backtrace":[{"b":"400000","o":"B6A889"},{"b":"400000","o":"B69F02"},{"b":"400000","o":"B6A25E"},{"b":"7F46EED2F000","o":"F710"},{"b":"400000","o":"F7FAD1"},{"b":"400000","o":"F810E3"},{"b":"400000","o":"F83215"},{"b":"400000","o":"F54424"},{"b":"400000","o":"F51E2D"},{"b":"400000","o":"F53F3C"},{"b":"7F46EED2F000","o":"79D1"},{"b":"7F46ED7A4000","o":"E88FD"}],"processInfo":{ "mongodbVersion" : "3.0.3", "gitVersion" : "b40106b36eecd1b4407eb1ad1af6bc60593c6105", "uname" : { "sysname" : "Linux", "release" : "2.6.32-431.5.1.el6.x86_64", "version" : "#1 SMP Wed Feb 12 00:41:43 UTC 2014", "machine" : "x86_64" }, "somap" : [ { "elfType" : 2, "b" : "400000", "buildId" : "108A63CA14A4BD5E599BAC10885DBD3A85DA5439" }, { "b" : "7FFF764FF000", "elfType" : 3, "buildId" : "F795EFBE6950D1523C5748594C166CEDD4254C33" }, { "b" : "7F46EED2F000", "path" : "/lib64/libpthread.so.0", "elfType" : 3, "buildId" : "B8DFF8E53D9F2B80C3C382E83EC17C828B536A39" }, { "b" : "7F46EEAC4000", "path" : "/usr/lib64/libssl.so.10", "elfType" : 3, "buildId" : "DB1785F0187DE5A9307FC2A79E8B1BE953C5562C" }, { "b" : "7F46EE6E4000", "path" : "/usr/lib64/libcrypto.so.10", "elfType" : 3, "buildId" : "58F274C5E71DE44F93ACA77E8F959601545A053C" }, { "b" : "7F46EE4DC000", "path" : "/lib64/librt.so.1", "elfType" : 3, "buildId" : "583411D8786F86A1D6B8741C502831E6122445A7" }, { "b" : "7F46EE2D8000", "path" : "/lib64/libdl.so.2", "elfType" : 3, "buildId" : "454F8FC6CC6502C6401E5F9E221564D80665D277" }, { "b" : "7F46EDFD2000", "path" : "/usr/lib64/libstdc++.so.6", "elfType" : 3, "buildId" : "ED99110E629209C5CA6C0ED704F2C5CE3171513A" }, { "b" : "7F46EDD4E000", "path" : "/lib64/libm.so.6", "elfType" : 3, "buildId" : "7D8E9374F4A4EA38A7C1E763F32240EA113E4208" }, { "b" : "7F46EDB38000", "path" : "/lib64/libgcc_s.so.1", "elfType" : 3, "buildId" : "A44499D29B114A5366CD72DD4883958495AC1C1D" }, { "b" : "7F46ED7A4000", "path" : "/lib64/libc.so.6", "elfType" : 3, "buildId" : "E4EAB3C200B7D8444FF95AB01F6466924A6A5F5F" }, { "b" : "7F46EEF4C000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "6F8E59B70E469F3A924A268911FF8FD0C37E7460" }, { "b" : "7F46ED560000", "path" : "/lib64/libgssapi_krb5.so.2", "elfType" : 3, "buildId" : "EFF68B7DE77D081BC4A0CB38FE9DCBC60541BF92" }, { "b" : "7F46ED27A000", "path" : "/lib64/libkrb5.so.3", "elfType" : 3, "buildId" : "95EBB74C2C0A1E1714344036145A0239FFA4892D" }, { "b" : "7F46ED076000", "path" : "/lib64/libcom_err.so.2", "elfType" : 3, "buildId" : "6A22EDFF4D4F04A57573E3D1536B6B4963159CD5" }, { "b" : "7F46ECE4A000", "path" : "/lib64/libk5crypto.so.3", "elfType" : 3, "buildId" : "D02E7D3149950118009A81997434E28B7D9EC9B2" }, { "b" : "7F46ECC34000", "path" : "/lib64/libz.so.1", "elfType" : 3, "buildId" : "5FA8E5038EC04A774AF72A9BB62DC86E1049C4D6" }, { "b" : "7F46ECA29000", "path" : "/lib64/libkrb5support.so.0", "elfType" : 3, "buildId" : "5AFCBEA0D62EE0335714CCBAB7BA808E2A16028C" }, { "b" : "7F46EC826000", "path" : "/lib64/libkeyutils.so.1", "elfType" : 3, "buildId" : "8A8734DC37305D8CC2EF8F8C3E5EA03171DB07EC" }, { "b" : "7F46EC60C000", "path" : "/lib64/libresolv.so.2", "elfType" : 3, "buildId" : "F8B68F301C19BF06AF56B4B06E0A69F89D2C1F8D" }, { "b" : "7F46EC3ED000", "path" : "/lib64/libselinux.so.1", "elfType" : 3, "buildId" : "BAD5C71361DADF259B6E306A49E6F47F24AEA3DC" } ] }}
 mongod(_ZN5mongo15printStackTraceERSo+0x29) [0xf6a889]
 mongod(+0xB69F02) [0xf69f02]
 mongod(+0xB6A25E) [0xf6a25e]
 libpthread.so.0(+0xF710) [0x7f46eed3e710]
 mongod(+0xF7FAD1) [0x137fad1]
 mongod(+0xF810E3) [0x13810e3]
 mongod(__wt_reconcile+0x1B5) [0x1383215]
 mongod(__wt_evict+0x104) [0x1354424]
 mongod(__wt_evict_page+0x2D) [0x1351e2d]
 mongod(+0xF53F3C) [0x1353f3c]
 libpthread.so.0(+0x79D1) [0x7f46eed369d1]
 libc.so.6(clone+0x6D) [0x7f46ed88c8fd]
-----  END BACKTRACE  -----

Comment by Yan [ 31/May/15 ]

Ramon, i have downloaded mongodb for linux from
name=MongoDB Repository
baseurl=http://repo.mongodb.org/yum/redhat/$releasever/mongodb-org/3.0/x86_64/

> db.serverBuildInfo()
{
	"version" : "3.0.3",
	"gitVersion" : "b40106b36eecd1b4407eb1ad1af6bc60593c6105",
	"OpenSSLVersion" : "OpenSSL 1.0.1e-fips 11 Feb 2013",
	"sysInfo" : "Linux ip-10-182-86-231 2.6.32-220.el6.x86_64 #1 SMP Wed Nov 9 08:03:13 EST 2011 x86_64 BOOST_LIB_VERSION=1_49",
	"loaderFlags" : "-fPIC -pthread -Wl,-z,now -rdynamic",
	"compilerFlags" : "-Wnon-virtual-dtor -Woverloaded-virtual -std=c++11 -fPIC -fno-strict-aliasing -ggdb -pthread -Wall -Wsign-compare -Wno-unknown-pragmas -Winvalid-pch -pipe -Werror -O3 -Wno-unused-local-typedefs -Wno-unused-function -Wno-deprecated-declarations -Wno-unused-but-set-variable -Wno-missing-braces -fno-builtin-memcmp -std=c99",
	"allocator" : "tcmalloc",
	"versionArray" : [
		3,
		0,
		3,
		0
	],
	"javascriptEngine" : "V8",
	"bits" : 64,
	"debug" : false,
	"maxBsonObjectSize" : 16777216,
	"ok" : 1
}

Comment by Ramon Fernandez Marina [ 31/May/15 ]

lightket, can you also please provide more details about the binary you're using? Did you build it yourself? If the answer is no, where did you download it from? Are you using the OS' package manager? If you could send us the output of db.serverVersion() as well it would help us diagnose your problem.

Thanks,
Ramón.

Comment by Ramon Fernandez Marina [ 31/May/15 ]

lightket, can you please share with us the code you're using to trigger this behavior? You can upload it privately (only accessible to MongoDB staff) and securely via scp as follows:

scp -P 722 -r <filename> SERVER-18758@www.mongodb.com:

where <filename> is the file or directory to upload. When prompted for a password just press enter.

Thanks,
Ramón.

Generated at Thu Feb 08 03:48:39 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.