[SERVER-21709] Map Reduce causes segfault in 3.2 RC4 Created: 01/Dec/15  Updated: 08/Jan/24  Resolved: 08/Dec/15

Status: Closed
Project: Core Server
Component/s: MapReduce
Affects Version/s: 3.2.0-rc4
Fix Version/s: 3.2.0-rc6

Type: Bug Priority: Critical - P2
Reporter: Stuart Hall Assignee: Mira Carey
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File diagmetrics.tgz     File map_reduce_anon.js     PNG File mr-oom.png    
Issue Links:
Depends
depends on SERVER-21711 make currentJSExceptionToStatus more ... Closed
depends on SERVER-21716 SpiderMonkey doesn't appear capable o... Closed
depends on SERVER-21728 Disable extra threads for JSRuntimes Closed
Related
Backwards Compatibility: Fully Compatible
Operating System: ALL
Steps To Reproduce:

See below.

Sprint: Platform D (12/11/15)
Participants:

 Description   

Description
We've been testing some of our routing mapReduces in 3.2 RC4 and have a reproducible segfault that occurs on both Windows and CentOS7. The attached script works fine in 3.0 WT but crashes consistently in 3.2 RC4 WT.

For information, this script collapses documents into daily-groups and is a relatively simple MR process.

Steps to reproduce

  1. Restore the data from here: https://dl.dropboxusercontent.com/u/map_reduce_anon.js6076108/routeResult_anonymised.bson.gz
  2. Run the attached script: (mongo <dbname> map_reduce_anon.js)
  3. MongoD will crash after a few seconds

The Windows build does not write the crash details into the log, but the CentOS version does:

----- BEGIN BACKTRACE -----
{"backtrace":[{"b":"400000","o":"EE6892"},{"b":"400000","o":"EE59F9"},{"b":"400000","o":"EE5D78"},{"b":"7F89308C3000","o":"F130"},{"b":"400000","o":"14CAF94"},{"b":"400000","o":"140225E"},{"b":"400000","o":"E3F7EE"},{"b":"400000","o":"E3F950"},{"b":"400000","o":"E60299"}
,{"b":"400000","o":"E61CD0"},{"b":"400000","o":"E62EB3"},{"b":"400000","o":"E5C22C"},{"b":"400000","o":"E4BA29"},{"b":"400000","o":"114025F"},{"b":"400000","o":"1140D0C"},{"b":"400000","o":"13ADC5B"},{"b":"0","o":"7F8931E537BE"}],"processInfo":{ "mongodbVersion" : "3.2.0
-rc4", "gitVersion" : "3b3ef4253a6c5d5d3f18127ac2272a9696488aec", "compiledModules" : [], "uname" : { "sysname" : "Linux", "release" : "3.10.0-123.el7.x86_64", "version" : "#1 SMP Mon Jun 30 12:09:22 UTC 2014", "machine" : "x86_64" }, "somap" : [ { "elfType" : 2, "b" : "
400000", "buildId" : "5D73AF6287E7408CABABC5501344AEE6A2C86929" }, { "b" : "7FFF50A97000", "elfType" : 3, "buildId" : "D7952DC468957C2B14B6BB79E613D48BA1224706" }, { "b" : "7F8931AEF000", "path" : "/lib64/libssl.so.10", "elfType" : 3, "buildId" : "58FEDFFED1A388AD9E495F9
A6C91A851B9537765" }, { "b" : "7F893170A000", "path" : "/lib64/libcrypto.so.10", "elfType" : 3, "buildId" : "F214E8640FDA5097E7A90CE7974B3FF76C6C42D9" }, { "b" : "7F8931502000", "path" : "/lib64/librt.so.1", "elfType" : 3, "buildId" : "8832E3070AB0758762836EEC8FCDDEDEF82
35340" }, { "b" : "7F89312FE000", "path" : "/lib64/libdl.so.2", "elfType" : 3, "buildId" : "B7C4BC0854BF5DE16B535353B38235CA42349C1E" }, { "b" : "7F8930FF7000", "path" : "/lib64/libstdc++.so.6", "elfType" : 3, "buildId" : "63C62D6263FF98E6DD6896CB3E716E499744A4C9" }, { "
b" : "7F8930CF5000", "path" : "/lib64/libm.so.6", "elfType" : 3, "buildId" : "D70EAB176DDA46DE292FEB8208A0E8A6718BAF3B" }, { "b" : "7F8930ADF000", "path" : "/lib64/libgcc_s.so.1", "elfType" : 3, "buildId" : "07120A9AC1BF3BCDD4A3EA1E0C47234A4A5C84F9" }, { "b" : "7F89308C3
000", "path" : "/lib64/libpthread.so.0", "elfType" : 3, "buildId" : "18562EE0363BC9BD7101610BD86469AA426D0C44" }, { "b" : "7F8930502000", "path" : "/lib64/libc.so.6", "elfType" : 3, "buildId" : "78186287BBA77069A056A5CCBEB14B7FD2CA3A4B" }, { "b" : "7F8931D5B000", "path"
: "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "4EADCA6CB82E0A85EDB87C15B5E3980742514501" }, { "b" : "7F89302B8000", "path" : "/lib64/libgssapi_krb5.so.2", "elfType" : 3, "buildId" : "641A441AB91715A7E3AF8AD9AF38EE07F17866FE" }, { "b" : "7F892FFD8000", "path
" : "/lib64/libkrb5.so.3", "elfType" : 3, "buildId" : "08E8BA638E79EC07F98198ED40F90FA87D5EEEB5" }, { "b" : "7F892FDD4000", "path" : "/lib64/libcom_err.so.2", "elfType" : 3, "buildId" : "D2678F5F391BF2877E1BD6FAD16DBC589ED0BBF3" }, { "b" : "7F892FB9F000", "path" : "/lib6
4/libk5crypto.so.3", "elfType" : 3, "buildId" : "8269E77C68B707158D2B1BEA356EE0FC2A1C0024" }, { "b" : "7F892F989000", "path" : "/lib64/libz.so.1", "elfType" : 3, "buildId" : "E45643F27F3B3E960F3691AFC6EC27A98EF7B46B" }, { "b" : "7F892F77B000", "path" : "/lib64/libkrb5sup
port.so.0", "elfType" : 3, "buildId" : "577A21CDAA3D662B87D53AFAA12A1E7B34AD513F" }, { "b" : "7F892F577000", "path" : "/lib64/libkeyutils.so.1", "elfType" : 3, "buildId" : "2E01D5AC08C1280D013AAB96B292AC58BC30A263" }, { "b" : "7F892F35D000", "path" : "/lib64/libresolv.so
.2", "elfType" : 3, "buildId" : "519F19CF966514EAC9B25BE6FE953750E466D3C1" }, { "b" : "7F892F138000", "path" : "/lib64/libselinux.so.1", "elfType" : 3, "buildId" : "82FF6B18E1E42825CC2D060F969479AD4AF2F62C" }, { "b" : "7F892EED7000", "path" : "/lib64/libpcre.so.1", "elfT
ype" : 3, "buildId" : "B19961A753FDFF85BD071340139A7F024BAEFFCA" }, { "b" : "7F892ECB2000", "path" : "/lib64/liblzma.so.5", "elfType" : 3, "buildId" : "218D03D1F6CF1A099A4D467B5E8ECF4F2BF45750" } ] }}
 mongod(_ZN5mongo15printStackTraceERSo+0x32) [0x12e6892]
 mongod(+0xEE59F9) [0x12e59f9]
 mongod(+0xEE5D78) [0x12e5d78]
 libpthread.so.0(+0xF130) [0x7f89308d2130]
 mongod(_ZN2js15UncheckedUnwrapEP8JSObjectbPj+0x34) [0x18caf94]
 mongod(_Z21js_ErrorFromExceptionP9JSContextN2JS6HandleIP8JSObjectEE+0x1E) [0x180225e]
 mongod(_ZN5mongo5mozjs26currentJSExceptionToStatusEP9JSContextNS_10ErrorCodes5ErrorENS_10StringDataE+0xEE) [0x123f7ee]
 mongod(_ZN5mongo5mozjs23throwCurrentJSExceptionEP9JSContextNS_10ErrorCodes5ErrorENS_10StringDataE+0x20) [0x123f950]
 mongod(_ZN5mongo5mozjs13ObjectWrapper3Key3getEP9JSContextN2JS6HandleIP8JSObjectEENS5_13MutableHandleINS5_5ValueEEE+0xD9) [0x1260299]
 mongod(_ZN5mongo5mozjs13ObjectWrapper11_writeFieldEPNS_14BSONObjBuilderENS1_3KeyEPNS0_13LifetimeStackINS1_24WriteFieldRecursionFrameELm150EEEPNS_7BSONObjE+0xA0) [0x1261cd0]
 mongod(_ZN5mongo5mozjs13ObjectWrapper6toBSONEv+0x933) [0x1262eb3]
 mongod(_ZN5mongo5mozjs18NativeFunctionInfo4callEP9JSContextN2JS8CallArgsE+0xBC) [0x125c22c]
 mongod(_ZN5mongo5mozjs7smUtils4callINS0_18NativeFunctionInfoEEEbP9JSContextjPN2JS5ValueE+0x19) [0x124ba29]
 mongod(_ZN2js6InvokeEP9JSContextN2JS8CallArgsENS_14MaybeConstructE+0x34F) [0x154025f]
 mongod(_ZN2js6InvokeEP9JSContextRKN2JS5ValueES5_jPS4_NS2_13MutableHandleIS3_EE+0x20C) [0x1540d0c]
 mongod(_ZN2js3jit14InvokeFunctionEP9JSContextN2JS6HandleIP8JSObjectEEjPNS3_5ValueES9_+0xBB) [0x17adc5b]
 ??? [0x7f8931e537be]
-----  END BACKTRACE  -----



 Comments   
Comment by Stuart Hall [ 08/Dec/15 ]

Thanks Jason. I'll re-test when I have 5 mins this week.

SH

Comment by Mira Carey [ 08/Dec/15 ]

Stuart,

After some analysis, it seems that the segfault you saw was related to several tickets:

  • SERVER-21717 - Our out of memory handling was overly brittle.
  • SERVER-21728 - Background threads in spidermonkey were causing memory ballooning issues leading to out of memory conditions.

Both of these are fixed in the ga. And your repro, which I was able to run locally, no longer reproduces for me after those fixes.

Thanks,
Jason

Comment by Stuart Hall [ 01/Dec/15 ]

Thanks Jason. Feel free to get back to me if you need any more information.

SH

Comment by Mira Carey [ 01/Dec/15 ]

Stuart,

Thanks for updating that link. I've got your repro up and running locally, so it looks like I've got all I need to start in on a fix.

-Jason

Comment by Stuart Hall [ 01/Dec/15 ]

Hi Jason,

Apologies, it got mucked up as I submitted the ticket. You can get it here:
https://dl.dropboxusercontent.com/u/6076108/routeResult_anonymised.bson.gz
(It's too big for your Jira system)

Thanks,

Stuart H.

Comment by Mira Carey [ 01/Dec/15 ]

Stuart,

Unfortunately the dropbox link you've attached fails to resolve for me (returns a generic 404). Could you verify the link, or if the file isn't unreasonably large, attach it directly?

Thanks,
Jason

Generated at Thu Feb 08 03:58:11 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.