[SERVER-22927] mongo dies with Segmentation fault in Solaris 11.3 / Illumos February 2016 Created: 02/Mar/16 Updated: 08/Jan/24 Resolved: 12/Jul/16 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | JavaScript |
| Affects Version/s: | 3.1.6, 3.2.0 |
| Fix Version/s: | 3.3.10 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Andreas Grüninger | Assignee: | Jonathan Reams |
| Resolution: | Done | Votes: | 0 |
| Labels: | bkp | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||||||
| Backport Requested: |
v3.2
|
||||||||||||||||||||||||
| Steps To Reproduce: | Download any source after 3.1.6.
|
||||||||||||||||||||||||
| Sprint: | Platforms 13 (04/22/16), Platforms 14 (05/13/16), Platforms 15 (06/03/16), Platforms 16 (06/24/16), Platforms 17 (07/15/16) | ||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||
| Description |
|
I downloaded the Solaris binaries for 3.3.2 and used them in Solaris 11.3 and OpenIndiana (Illumos kernel from February 2016). Mongodb starts and works without problems. Also mongostat. The shell mongo dies with segmentation fault. To debug I downloaded the source for 3.2.3 and compiled it with the following statement: The debugger shows where mongo dies.
|
| Comments |
| Comment by Githook User [ 29/Jun/16 ] |
|
Author: {u'username': u'jbreams', u'name': u'Jonathan Reams', u'email': u'jbreams@mongodb.com'}Message: |
| Comment by Mark Benvenuto [ 09/Jun/16 ] |
|
jonathan.reams Can you handle the backport of the ARM64 port? For the Solaris Sparc64, the high constant is 0x80100000000. For x864 Solaris, it is the same as the ARM64 port. |
| Comment by Filip Hajny [ 24/May/16 ] |
|
Not sure if the prototype fix should apply to the 3.2 branch as well, but it didn't fix the problem for me there. |
| Comment by Mark Benvenuto [ 20/May/16 ] |
|
I have prototyped a fix here: https://github.com/markbenvenuto/mongo/commit/33cd16b77c3682f224517b1bc44e1ce8fe646819. It still needs polish and testing before it is commit ready. The idea is to choose a random point in the virtual address space of the process that the SpiderMonkey GC is happy with. I take advantage of the fact that unlike MAP_FIXED, the default mmap address hint mode will not overwrite an existing virtual address allocation if one already exists. |
| Comment by Andrew Stormont [ 17/May/16 ] |
|
The proposed fix for the mmap issue doesn't solve this problem. Tested with GCC 5.1. |
| Comment by Mark Benvenuto [ 16/May/16 ] |
|
grueni Firefox on OpenIndiana is compiled for 32-bit, not 64-bit. I found this issue: "mmap doesn't respect address hints" which shows that Solaris does not respect mmap hints. I was able to confirm that on Intel x64 machine with i86pc illumos-f83b46b, and "OpenIndiana Hipster 2016.04 Live USB (32/64-bit x86)". This problem did not repro on an Solaris VM running under Linux KVM. |
| Comment by Andreas Grüninger [ 10/May/16 ] |
|
May I suggest to have a look in https://github.com/OpenIndiana/oi-userland/tree/oi/hipster/components/web/firefox/patches. |
| Comment by Daniel Heitepriem [ 26/Apr/16 ] |
|
Same problem here which is pretty annoying because I recently updated from Solaris 10 to Solaris 11.3 (as seen in Edit: Kernel is SunOS 5.11 11.3 i86pc i386 i86pc (Oracle Solaris 11.3 X86 Assembled 06 October 2015) |
| Comment by Alexander Pyhalov [ 23/Apr/16 ] |
|
OpenIndiana, illumos-e7e978b, this is kernel for 22 December 2015. |
| Comment by Filip Hajny [ 22/Apr/16 ] |
|
Getting the same exception as Alexander on SmartOS (joyent_20141030T081701Z i86pc i386 i86pc) after applying the patch to 3.2.4. Building with GCC 4.9.3 under pkgsrc. |
| Comment by Mark Benvenuto [ 22/Apr/16 ] |
|
I experimentally backported the change to 3.2.5, and it is working on the following kernel SunOS omni 5.11 omnios-master-0832725 i86pc i386 i86pc. Which kernel did you use. Failed to initialize JSContext means that the Solaris mmap call only used the hint on the first call and ignored it on the second call and chose a random address. I saw this error on Oracle Solaris 11.2 on Sparc. |
| Comment by Alexander Pyhalov [ 22/Apr/16 ] |
|
After applying this patch to mongo 3.2.5 mongo shell doesn't crash, but exits with "exception: Failed to initialize JSContext" |
| Comment by Githook User [ 20/Apr/16 ] |
|
Author: {u'username': u'markbenvenuto', u'name': u'Mark Benvenuto', u'email': u'mark.benvenuto@mongodb.com'}Message: |
| Comment by Mark Benvenuto [ 15/Apr/16 ] |
|
After I did more digging, I came across this code in the Mozilla JS code base: https://github.com/mongodb/mongo/blob/master/src/third_party/mozjs-38/extract/js/src/gc/Memory.cpp#L405-L419 Since Mozilla hit this problem on ia64, they came up with a workaround by giving a hint to mmap where to allocate memory. Using the mmap hints appears to work fine in my testing on i86pc kernel on OmniOS Bloody on KVM/QEMU. They picked 0x70000000000 (~2^42) as their hint to use. This appears to work find on x64 since there is 2^48 bits of address space to use, and x64 Solaris allows up to 0x800000000000 (2^47) before the reserved address space hole. On Sparc64 (an unsupported platform), using 0x70000000000 will not work since Sparc64 has 44-bits of address space, and the address space hole starts at 80100000000 (~2^43) I am planning to work on change next week, and also test this against the i86xpv kernel. Address Layouts Sparc |
| Comment by Jussi Sallinen [ 13/Apr/16 ] |
|
In case needed now or in the future I'll be able to provide SmartOS Zone instance for testing/development purposes, just email me if MongoDB would benefit from it. |
| Comment by Mark Benvenuto [ 13/Apr/16 ] |
|
First, we do all our Solaris testing with OmniOS 5.11 r151006 April 2015 in AWS. Now, Illumos & Solaris have two different kernels: i86pc & i86xpv. The later kernel is used in our Solaris image in AWS, and is designed to be run as a Xen paravirtual kernel. The first kernel is used in other environments like running Solaris on bare metal or other VMs. One of the differences between these two kernels is the different address ranges Solaris uses to mmap pages. i86xpv: i86pc: The x64 architecture currently supports 48 bits of address space, and SpiderMonkey assumes that the unused bits are set to zero (http://lxr.mozilla.org/mozilla-esr45/source/js/public/Value.h#221) when it runs on x64. Because Solaris is setting the high bit in the i86pc kernel, this breaks SpiderMonkey's assumption and so it fails to start. One possible workaround is to try to mmap all the high pages out that do not work for SpiderMonkey, use a custom mmap allocation routine with MEM_FIXED to choose addresses, or some other variation. |
| Comment by Jussi Sallinen [ 08/Apr/16 ] |
|
We are also hitting this same bug with SmartOS built 20160317T000621Z, MongoDB shell version: 3.2.3 |
| Comment by Filip Hajny [ 30/Mar/16 ] |
|
Do the Solaris binaries work for somebody (e.g. old Solaris)? We're seeing this issue as well on SmartOS with both official and custom-built binaries. |