[SERVER-2918] Server shuts down once a day, without being able to reproduce the crash Created: 08/Apr/11 Updated: 12/Jul/16 Resolved: 08/Apr/11 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Stability |
| Affects Version/s: | 1.6.5 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Catalin Trandafir | Assignee: | Unassigned |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Operating System: | ALL |
| Participants: |
| Description |
|
We've been having problems with mongodb since updating to 1.6.5 (about 1 month ago). It is a single server configuration - we have 7 GB in (data + indexes) and mongodb usually stays at around 7-8 GB RAM, but the machine has 15 GB RAM in total and 4 GB swap. The database is crashing every day with two different patterns: Fri Apr 8 10:37:14 [conn58463] run command jbm.$cmd { mapreduce: "mongocvs", map: CodeWScope( function() { var x; var folders = this.employers[96012].folders; for(x in folders) {emit(x,1);}}, {}), reduce: CodeWScope( function(k, vals) { var sum = 0; for (var i in vals) { sum += vals[i]; }return sum; }, {}), query: { empids: 96012, $or: [ { folders: 815493 }, { folders: 814868 }, { folders: 813880 }, { folders: 811648 }, { folders: 811647 }, { folders: 811644 }, { folders: 811643 }, { folders: 811642 }, { folders: 811640 }, { folders: 811639 }, { folders: 811638 }, { folders: 811637 }, { folders: 806672 }, { folders: 806671 }, { folders: 806670 }, { folders: 806583 }, { folders: 806493 }, { folders: 805036 }, { folders: 805035 }, { folders: 801933 }, { folders: 801932 }, { folders: 801929 }, { folders: 801648 }, { folders: 800608 }, { folders: 798096 }, { folders: 797344 }, { folders: 797306 }, { folders: 796627 }, { folders: 796626 }, { folders: 796625 } ] }, out: "mr_count_folder_cvs" } } } reslen:64 0ms Fri Apr 8 10:37:14 [conn58463] CMD: drop jbm.tmp.mr.mapreduce_1302248234_47440 reslen:78 0ms Fri Apr 8 10:37:14 [conn58463] CMD: drop jbm.tmp.mr.mapreduce_1302248234_47440_inc reslen:78 0ms Fri Apr 8 10:55:32 [conn58463] mr failed, removing collection Fri Apr 8 10:57:01 [conn58463] CMD: drop jbm.tmp.mr.mapreduce_1302248234_47440 reslen:78 88937ms Fri Apr 8 10:57:03 [conn58463] CMD: drop jbm.tmp.mr.mapreduce_1302248234_47440_inc reslen:105 2710ms Our plans for the immediate future are to upgrade to 1.8 version and start another machine, transforming the system into a replica set - and that is what we are working at right now. But I still want to investigate this problem and be sure that it won't be present after the upgrade. Notes:
|
| Comments |
| Comment by Eliot Horowitz (Inactive) [ 08/Apr/11 ] |
|
There were a couple of memory leaks in JS that can cause issues with spidermonkey, which is why its happening in 1.6.5 |
| Comment by Catalin Trandafir [ 08/Apr/11 ] |
|
Thank you for the loop idea - at least I was able to reproduce the crash. I still don't understand why it is crashing on 1.6.5, but being able to reproduce it makes it a lot less frustrating. |
| Comment by Catalin Trandafir [ 08/Apr/11 ] |
|
I tried to run it in a loop: m = function() { var x; var folders = this.employers[96012].folders; for(x in folders) {emit(x,1);}}; r = function(k, vals) { var sum = 0; for (var i in vals) { sum += vals[i]; }return sum; }; for ( i=1; i<1000; i++ ){ , { folders: 814868 }, { folders: 813880 }, { folders: 811648 }, { folders: 811647 }, { folders: 811644 }, { folders: 811643 }, { folders: 811642 }, { folders: 811640 }, { folders: 811639 }, { folders: 811638 }, { folders: 811637 }, { folders: 806672 }, { folders: 806671 }, { folders: 806670 }, { folders: 806583 }, { folders: 806493 }, { folders: 805036 }, { folders: 805035 }, { folders: 801933 }, { folders: 801932 }, { folders: 801929 }, { folders: 801648 }, { folders: 800608 }, { folders: 798096 }, { folders: 797344 }, { folders: 797306 }, { folders: 796627 }, { folders: 796626 }, { folders: 796625 } ] }, I ran this in a test environment, on 1.6.5 version - it crashed the server (tried it twice); then I tried this on another server, with 1.8 version, same data, replicaset environment, but only with the primary started (and 2 arbiters) - on 1.8 version the server doesn't crash. |
| Comment by Eliot Horowitz (Inactive) [ 08/Apr/11 ] |
|
Looks like a memory leak, possible in map/reduce. If you run that same command in a loop, does it crash? |