[SERVER-22927] mongo dies with Segmentation fault in Solaris 11.3 / Illumos February 2016 Created: 02/Mar/16  Updated: 08/Jan/24  Resolved: 12/Jul/16

Status: Closed
Project: Core Server
Component/s: JavaScript
Affects Version/s: 3.1.6, 3.2.0
Fix Version/s: 3.3.10

Type: Bug Priority: Major - P3
Reporter: Andreas Grüninger Assignee: Jonathan Reams
Resolution: Done Votes: 0
Labels: bkp
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Duplicate
is duplicated by SERVER-25036 Solaris 11.3 x86 Failed to inizialize... Closed
Related
related to SERVER-24400 Backport SpiderMonkey fix for aarch64 Closed
related to SERVER-23629 SpiderMonkey compromising port to AIX Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v3.2
Steps To Reproduce:

Download any source after 3.1.6.
Build the binaries with
scons core CCFLAGS=-m64 LINKFLAGS=-m64 -j 16 --prefix=$PREFIX --js-engine=none
Start mongo in Debugger with

  1. gdb build/opt/mongo/mongo
    > run
    CRASH
    > where
Sprint: Platforms 13 (04/22/16), Platforms 14 (05/13/16), Platforms 15 (06/03/16), Platforms 16 (06/24/16), Platforms 17 (07/15/16)
Participants:

 Description   

I downloaded the Solaris binaries for 3.3.2 and used them in Solaris 11.3 and OpenIndiana (Illumos kernel from February 2016).

Mongodb starts and works without problems. Also mongostat.

The shell mongo dies with segmentation fault.
The last working version of mongo is 3.1.6.

To debug I downloaded the source for 3.2.3 and compiled it with the following statement:
scons core CCFLAGS=-m64 LINKFLAGS=-m64 -j 16 --prefix=$PREFIX --js-engine=none

The debugger shows where mongo dies.
Apparently the js engine mozjs-38.
Firefox 43 is available in Illumos and so it is possible to compile the javascript engine in a better fashion.

GNU gdb (GDB) 7.6.2
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "i386-pc-solaris2.11".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /export/builds/mongodb-src-r3.2.3/build/opt/mongo/mongo...done.
(gdb) run
Starting program: /export/builds/mongodb-src-r3.2.3/build/opt/mongo/mongo
[Thread debugging using libthread_db enabled]
MongoDB shell version: 3.2.3
[New Thread 1 (LWP 1)]
[New LWP    2        ]
[New LWP    3        ]
[New Thread 3 (LWP 3)]
 
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 3 (LWP 3)]
0x0000000000c8b75a in js::NativeObject::setSlot(unsigned int, JS::Value const&) ()
(gdb) where
#0  0x0000000000c8b75a in js::NativeObject::setSlot(unsigned int, JS::Value const&) ()
#1  0x000000000105d77e in setSlotWithType (overwriting=false, value=..., shape=0xfffffd7ffda21100, cx=0x157c370, this=0xfffffd7ffda22060)
    at src/third_party/mozjs-38/extract/js/src/vm/NativeObject-inl.h:314
#2  UpdateShapeTypeAndValue (cx=cx@entry=0x157c370, obj=0xfffffd7ffda22060, shape=0xfffffd7ffda21100, value=...)
    at src/third_party/mozjs-38/extract/js/src/vm/NativeObject.cpp:1113
#3  0x000000000107ea3a in DefinePropertyOrElement (cx=cx@entry=0x157c370, obj=obj@entry=..., id=..., id@entry=..., getter=0x0, setter=0x0, attrs=<optimized out>,
    value=..., callSetterAfterwards=false, setterIsStrict=false) at src/third_party/mozjs-38/extract/js/src/vm/NativeObject.cpp:1200
#4  0x000000000107f344 in js::NativeDefineProperty (cx=0x157c370, obj=..., id=..., value=..., getter=0x0, setter=0x0, attrs=<optimized out>)
    at src/third_party/mozjs-38/extract/js/src/vm/NativeObject.cpp:1487
#5  0x0000000000f9b503 in js::DefineProperty (cx=<optimized out>, obj=..., id=..., value=..., getter=<optimized out>, setter=<optimized out>, attrs=6)
    at src/third_party/mozjs-38/extract/js/src/jsobj.cpp:3212
#6  0x0000000000f48db2 in DefinePropertyById (cx=cx@entry=0x157c370, obj=obj@entry=..., id=id@entry=..., value=..., value@entry=..., get=..., set=..., attrs=6,
    flags=0) at src/third_party/mozjs-38/extract/js/src/jsapi.cpp:2155
#7  0x0000000000f49d5f in DefineProperty (cx=cx@entry=0x157c370, obj=..., name=name@entry=0x1291e41 "std_iterator", value=..., getter=..., setter=..., attrs=6,
    flags=0) at src/third_party/mozjs-38/extract/js/src/jsapi.cpp:2298
#8  0x0000000000f49ee3 in JS_DefineProperty (cx=cx@entry=0x157c370, obj=..., obj@entry=..., name=name@entry=0x1291e41 "std_iterator", value=..., value@entry=...,
    attrs=attrs@entry=6, getter=getter@entry=0x0, setter=0x0) at src/third_party/mozjs-38/extract/js/src/jsapi.cpp:2346
#9  0x000000000105d4a6 in js::GlobalObject::initSelfHostingBuiltins (cx=cx@entry=0x157c370, global=global@entry=...,
    builtins=builtins@entry=0x137b780 <intrinsic_functions>) at src/third_party/mozjs-38/extract/js/src/vm/GlobalObject.cpp:381
#10 0x0000000000d6be90 in JSRuntime::createSelfHostingGlobal (cx=cx@entry=0x157c370) at src/third_party/mozjs-38/extract/js/src/vm/SelfHosting.cpp:1041
#11 0x0000000000d6bfec in JSRuntime::initSelfHosting (this=this@entry=0x1583420, cx=cx@entry=0x157c370)
    at src/third_party/mozjs-38/extract/js/src/vm/SelfHosting.cpp:1065
#12 0x0000000000f61843 in js::NewContext (rt=0x1583420, stackChunkSize=stackChunkSize@entry=8192) at src/third_party/mozjs-38/extract/js/src/jscntxt.cpp:126
#13 0x0000000000f61895 in JS_NewContext (rt=<optimized out>, stackChunkSize=stackChunkSize@entry=8192) at src/third_party/mozjs-38/extract/js/src/jsapi.cpp:569
#14 0x0000000000a8e23d in mongo::mozjs::MozJSImplScope::MozRuntime::MozRuntime (this=0x1580af8, engine=<optimized out>)
    at src/mongo/scripting/mozjs/implscope.cpp:268
#15 0x0000000000a8e4ed in mongo::mozjs::MozJSImplScope::MozJSImplScope (this=0x1580a70, engine=0x14ec8a0) at src/mongo/scripting/mozjs/implscope.cpp:325
#16 0x0000000000ab2fa1 in mongo::mozjs::MozJSProxyScope::implThread (arg=0x14e0670) at src/mongo/scripting/mozjs/proxyscope.cpp:330
#17 0x0000000000a73346 in nspr::Thread::ThreadRoutine (arg=0x14d9c50) at src/mongo/scripting/mozjs/PosixNSPR.cpp:56
#18 0xfffffd7fe909a201 in std::(anonymous namespace)::execute_native_thread_routine (__p=<optimized out>)
    at /jenkins/jobs/oi-userland/workspace/components/gcc49/gcc-4.9.3/libstdc++-v3/src/c++11/thread.cc:84
#19 0xfffffd7ff83a6f5a in _thrp_setup () from /lib/64/libc.so.1
#20 0xfffffd7ff83a7270 in ?? () from /lib/64/libc.so.1
#21 0x0000000000000000 in ?? ()



 Comments   
Comment by Githook User [ 29/Jun/16 ]

Author:

{u'username': u'jbreams', u'name': u'Jonathan Reams', u'email': u'jbreams@mongodb.com'}

Message: SERVER-22927 Fix spidermonkey mapped memory on Solaris
Branch: master
https://github.com/mongodb/mongo/commit/4398dfaceec83ce195f2c27847535dc3b10b0ee4

Comment by Mark Benvenuto [ 09/Jun/16 ]

jonathan.reams Can you handle the backport of the ARM64 port?

For the Solaris Sparc64, the high constant is 0x80100000000. For x864 Solaris, it is the same as the ARM64 port.

Comment by Filip Hajny [ 24/May/16 ]

Not sure if the prototype fix should apply to the 3.2 branch as well, but it didn't fix the problem for me there.

Comment by Mark Benvenuto [ 20/May/16 ]

I have prototyped a fix here: https://github.com/markbenvenuto/mongo/commit/33cd16b77c3682f224517b1bc44e1ce8fe646819. It still needs polish and testing before it is commit ready. The idea is to choose a random point in the virtual address space of the process that the SpiderMonkey GC is happy with. I take advantage of the fact that unlike MAP_FIXED, the default mmap address hint mode will not overwrite an existing virtual address allocation if one already exists.

Comment by Andrew Stormont [ 17/May/16 ]

The proposed fix for the mmap issue doesn't solve this problem. Tested with GCC 5.1.

Comment by Mark Benvenuto [ 16/May/16 ]

grueni Firefox on OpenIndiana is compiled for 32-bit, not 64-bit.

I found this issue: "mmap doesn't respect address hints" which shows that Solaris does not respect mmap hints. I was able to confirm that on Intel x64 machine with i86pc illumos-f83b46b, and "OpenIndiana Hipster 2016.04 Live USB (32/64-bit x86)". This problem did not repro on an Solaris VM running under Linux KVM.

Comment by Andreas Grüninger [ 10/May/16 ]

May I suggest to have a look in https://github.com/OpenIndiana/oi-userland/tree/oi/hipster/components/web/firefox/patches.
There you will find the patches for compiling firefox including js mainly provided by Martin Bochnig.

Comment by Daniel Heitepriem [ 26/Apr/16 ]

Same problem here which is pretty annoying because I recently updated from Solaris 10 to Solaris 11.3 (as seen in SERVER-23565). Is there any way to fix this so we can at least use a version of 3.2.X?

Edit: Kernel is SunOS 5.11 11.3 i86pc i386 i86pc (Oracle Solaris 11.3 X86 Assembled 06 October 2015)

Comment by Alexander Pyhalov [ 23/Apr/16 ]

OpenIndiana, illumos-e7e978b, this is kernel for 22 December 2015.

Comment by Filip Hajny [ 22/Apr/16 ]

Getting the same exception as Alexander on SmartOS (joyent_20141030T081701Z i86pc i386 i86pc) after applying the patch to 3.2.4. Building with GCC 4.9.3 under pkgsrc.

Comment by Mark Benvenuto [ 22/Apr/16 ]

I experimentally backported the change to 3.2.5, and it is working on the following kernel SunOS omni 5.11 omnios-master-0832725 i86pc i386 i86pc. Which kernel did you use.

Failed to initialize JSContext means that the Solaris mmap call only used the hint on the first call and ignored it on the second call and chose a random address. I saw this error on Oracle Solaris 11.2 on Sparc.

Comment by Alexander Pyhalov [ 22/Apr/16 ]

After applying this patch to mongo 3.2.5 mongo shell doesn't crash, but exits with "exception: Failed to initialize JSContext"

Comment by Githook User [ 20/Apr/16 ]

Author:

{u'username': u'markbenvenuto', u'name': u'Mark Benvenuto', u'email': u'mark.benvenuto@mongodb.com'}

Message: SERVER-22927 SpiderMonkey mmap on Solaris needs hint to avoid allocating
memory in high memory regions that SpiderMonkey needs
Branch: master
https://github.com/mongodb/mongo/commit/faa12fbdda5f4933d5952102fd35252928afd749

Comment by Mark Benvenuto [ 15/Apr/16 ]

After I did more digging, I came across this code in the Mozilla JS code base: https://github.com/mongodb/mongo/blob/master/src/third_party/mozjs-38/extract/js/src/gc/Memory.cpp#L405-L419
See https://bugzilla.mozilla.org/show_bug.cgi?id=589735 for details

Since Mozilla hit this problem on ia64, they came up with a workaround by giving a hint to mmap where to allocate memory. Using the mmap hints appears to work fine in my testing on i86pc kernel on OmniOS Bloody on KVM/QEMU. They picked 0x70000000000 (~2^42) as their hint to use. This appears to work find on x64 since there is 2^48 bits of address space to use, and x64 Solaris allows up to 0x800000000000 (2^47) before the reserved address space hole. On Sparc64 (an unsupported platform), using 0x70000000000 will not work since Sparc64 has 44-bits of address space, and the address space hole starts at 80100000000 (~2^43)

I am planning to work on change next week, and also test this against the i86xpv kernel.

Address Layouts
AMD64
https://docs.oracle.com/cd/E18752_01/html/816-5138/fcowb.html

Sparc
https://docs.oracle.com/cd/E18752_01/html/816-5138/advanced-2.html

Comment by Jussi Sallinen [ 13/Apr/16 ]

In case needed now or in the future I'll be able to provide SmartOS Zone instance for testing/development purposes, just email me if MongoDB would benefit from it.

Comment by Mark Benvenuto [ 13/Apr/16 ]

First, we do all our Solaris testing with OmniOS 5.11 r151006 April 2015 in AWS. Now, Illumos & Solaris have two different kernels: i86pc & i86xpv. The later kernel is used in our Solaris image in AWS, and is designed to be run as a Xen paravirtual kernel. The first kernel is used in other environments like running Solaris on bare metal or other VMs. One of the differences between these two kernels is the different address ranges Solaris uses to mmap pages.

i86xpv:
http://src.illumos.org/source/xref/illumos-gate/usr/src/uts/i86pc/sys/machparam.h#228

i86pc:
http://src.illumos.org/source/xref/illumos-gate/usr/src/uts/i86pc/sys/machparam.h#230

The x64 architecture currently supports 48 bits of address space, and SpiderMonkey assumes that the unused bits are set to zero (http://lxr.mozilla.org/mozilla-esr45/source/js/public/Value.h#221) when it runs on x64. Because Solaris is setting the high bit in the i86pc kernel, this breaks SpiderMonkey's assumption and so it fails to start.

One possible workaround is to try to mmap all the high pages out that do not work for SpiderMonkey, use a custom mmap allocation routine with MEM_FIXED to choose addresses, or some other variation.

Comment by Jussi Sallinen [ 08/Apr/16 ]

We are also hitting this same bug with SmartOS built 20160317T000621Z, MongoDB shell version: 3.2.3
MongoDB installed from pkgsrc 2015Q4: mongodb-3.2.3

Comment by Filip Hajny [ 30/Mar/16 ]

Do the Solaris binaries work for somebody (e.g. old Solaris)? We're seeing this issue as well on SmartOS with both official and custom-built binaries.

Generated at Thu Feb 08 04:01:51 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.