[SERVER-3131] JS Error: out of memory leading to segfaults Created: 23/May/11 Updated: 12/Jul/16 Resolved: 29/Jun/11 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | JavaScript |
| Affects Version/s: | 1.8.1 |
| Fix Version/s: | 1.9.1 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Paul Harvey | Assignee: | Antoine Girbal |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Ubuntu 10.04.2 LTS, 4GB RAM VMWare instance with one CPU core (Intel(R) Xeon(R) CPU X5680 @ 3.33GHz) Three member replica set, running as follows: /etc/mongodb/seta.conf: We were crashing with 1.8.1 from the 10gben deb repo, and we just got another segfault today running mongod --version Members are running only MongoDB, and nothing else This is the same setup which was discussed at http://groups.google.com/group/mongodb-user/browse_thread/thread/35efda30f3aeff35 |
||
| Attachments: |
|
| Operating System: | Linux |
| Participants: |
| Description |
|
Full details are at http://foswiki.org/Tasks/Item10672?section=mongodb-user2#Crash_5 (and even more details at http://foswiki.org/Tasks/Item10672). We tested for two weeks before putting into production, a new version of our application (Foswiki) using MongoDB 1.8.1 as a query-cache/accelerator. There were no unexplained instabilities. In production now though, we can run the site for a couple of days, but at least twice a week we are getting segfaults in whichever mongod happens to be the primary. It's not a sudden thing - we get these spurious JS Error: out of memory and Assertion: 10432:JS_NewObject failed for global warnings in the log, for an hour or two, before the mongod process segfaults. The problem is extremely hard to reproduce; we haven't been able to reproduce ourselves, in our test environment, using artificial loads (our production site is public to the Internet). seta.log.022c is a snippet leading up to the latest segfault. seta.log.022b is a snippet leading up to and including the first few minutes of problems after running fine for a couple of days. Both are using -vvvvv verbosity, which generates ~10GB/day of log files (extremely burdensome), so I've filtered them using grep -v ^checking |
| Comments |
| Comment by Jason R. Coombs [ 09/Jul/11 ] |
|
We started getting these "out of memory" errors today. We just moved to a new datacenter, so we have a few extra variables to contend with. We're also running MongoDB 1.8.1 (same as in the original datacenter). We did not encounter the segfault, even after several hours of the OOM errors, but we did restart the process to restore service. The OOM errors started after running for about 8 hours. We're using the same applications, though we're running more nodes. We're under moderate load. We have turned on journaling for our master DB (which we did not in our old DC). Should this ticket be marked as resolved when all that was done was to allow for an increase in JS memory? Is that the prescribed fix (wait until you get these errors, then bump up your JS memory)? Does the resolution indicate that the leak is caused by a client's JS code itself? Is there any value in upgrading to 1.8.2 with respect to this issue? Should we consider limiting the number of application nodes connecting? Is there any reason to think that journaling would have any impact? For now, we're watching for the OOM errors, expecting they'll crop up again in a few hours. |
| Comment by Antoine Girbal [ 29/Jun/11 ] |
|
moving this issue to 1.9.1 version since the increase in JS memory is available in 1.9 line. |
| Comment by Antoine Girbal [ 25/May/11 ] |
|
ok let us know how it goes. |
| Comment by Paul Harvey [ 25/May/11 ] |
|
We are now running with a build from b6f07e2b6db67ce4ef5812af4b912e3851f388f0 and changed the limit to 128M (resulting in 14fa0ca45328097b38d4d9dcf39081302079ecc6). For now I'm going to keep the cron job which restarts everything, for a while, so I'll come back in a week or two with my findings. Still, I feel that we need extra debug info to really get to the bottom of this leak. |
| Comment by Paul Harvey [ 24/May/11 ] |
|
Thank you! I'll git it a shot tonight. As I mentioned on |
| Comment by Antoine Girbal [ 24/May/11 ] |
|
The fact that these errors appear after running the app for some time, and then dont go away, point to a mem leak. following test of In engine_spidermonkey.cpp look for: |
| Comment by Paul Harvey [ 24/May/11 ] |
|
|
| Comment by Paul Harvey [ 23/May/11 ] |
|
I should clarify that we have three replica members, each are their own separate VMs (4GB RAM). They are in the same DC. Also the 022b log although 270KiB, is only 40 seconds of time elapsed, not "several minutes" as I said in the initial description. -vvvvv is extreme logging! |