Details
-
Bug
-
Resolution: Done
-
Major - P3
-
None
-
1.6.5
-
None
-
ALL
Description
We've been having problems with mongodb since updating to 1.6.5 (about 1 month ago). It is a single server configuration - we have 7 GB in (data + indexes) and mongodb usually stays at around 7-8 GB RAM, but the machine has 15 GB RAM in total and 4 GB swap.
The database is crashing every day with two different patterns:
1) Mongodb fills the whole memory on the server (all the 15 GB and the whole 4 GB swap) very fast (so it goes from 7-8 GB to 15 + 4 in about a minute); after that, nothing seems to be able to shut the server down (we tried db.shutdownServer(), kill -15, kill -9) - it's blocked, it accepts new connections, but doesn't return anything; the only shut down procedure that seems to work is shutting the whole machine down. After restarting the machine, the lock file is present so mongodb won't start, but after removing the lock file, the server starts and it's ok again (for a day or two).
2) The second pattern is not present every day, more like once every 5 days - the database just shuts down, freeing all the memory. Since I just observed this problem, I can isolate the connection that seems to have caused the shutdown. Here is how the log looks like, for this connection:
Fri Apr 8 10:37:14 [conn58463] run command jbm.$cmd { mapreduce: "mongocvs", map: CodeWScope( function() { var x; var folders = this.employers[96012].folders; for(x in folders)
{emit(x,1);}}, {}), reduce: CodeWScope( function(k, vals) { var sum = 0; for (var i in vals)
{ sum += vals[i]; }return sum; }, {}), query: { empids: 96012, $or: [
{ folders: 815493 },
{ folders: 814868 },
{ folders: 813880 },
{ folders: 811648 },
{ folders: 811647 },
{ folders: 811644 },
{ folders: 811643 },
{ folders: 811642 },
{ folders: 811640 },
{ folders: 811639 },
{ folders: 811638 },
{ folders: 811637 },
{ folders: 806672 },
{ folders: 806671 },
{ folders: 806670 },
{ folders: 806583 },
{ folders: 806493 },
{ folders: 805036 },
{ folders: 805035 },
{ folders: 801933 },
{ folders: 801932 },
{ folders: 801929 },
{ folders: 801648 },
{ folders: 800608 },
{ folders: 798096 },
{ folders: 797344 },
{ folders: 797306 },
{ folders: 796627 },
{ folders: 796626 },
{ folders: 796625 } ] }, out: "mr_count_folder_cvs" }
Fri Apr 8 10:37:14 [conn58463] mr ns: jbm.mongocvs
Fri Apr 8 10:37:14 [conn58463] run command jbm.$cmd { count: "system.namespaces", query:
}
Fri Apr 8 10:37:14 [conn58463] query jbm.$cmd ntoreturn:1 command: { count: "system.namespaces", query:
} reslen:64 0ms
Fri Apr 8 10:37:14 [conn58463] query jbm.system.js reslen:36 nreturned:0 0ms
Fri Apr 8 10:37:14 [conn58463] run command jbm.$cmd
Fri Apr 8 10:37:14 [conn58463] CMD: drop jbm.tmp.mr.mapreduce_1302248234_47440
Fri Apr 8 10:37:14 [conn58463] query jbm.$cmd ntoreturn:1 command:
reslen:78 0ms
Fri Apr 8 10:37:14 [conn58463] run command jbm.$cmd
Fri Apr 8 10:37:14 [conn58463] CMD: drop jbm.tmp.mr.mapreduce_1302248234_47440_inc
Fri Apr 8 10:37:14 [conn58463] query jbm.$cmd ntoreturn:1 command:
reslen:78 0ms
Fri Apr 8 10:37:14 [conn58463] create collection jbm.tmp.mr.mapreduce_1302248234_47440_inc
Fri Apr 8 10:55:32 [conn58463] mr failed, removing collection
Fri Apr 8 10:55:32 [conn58463] run command jbm.$cmd
Fri Apr 8 10:57:01 [conn58463] CMD: drop jbm.tmp.mr.mapreduce_1302248234_47440
Fri Apr 8 10:57:01 [conn58463] query jbm.$cmd ntoreturn:1 command:
reslen:78 88937ms
Fri Apr 8 10:57:01 [conn58463] run command jbm.$cmd
Fri Apr 8 10:57:03 [conn58463] CMD: drop jbm.tmp.mr.mapreduce_1302248234_47440_inc
Fri Apr 8 10:57:03 [conn58463] dropCollection: jbm.tmp.mr.mapreduce_1302248234_47440_inc
Fri Apr 8 10:57:03 [conn58463] dropIndexes done
Fri Apr 8 10:57:03 [conn58463] query jbm.$cmd ntoreturn:1 command:
reslen:105 2710ms
Fri Apr 8 10:57:04 [conn58463] Uncaught std::exception: St9bad_alloc, terminating
Fri Apr 8 10:57:04 [conn58463] shutdown: going to close listening sockets...
Fri Apr 8 10:57:04 [conn58463] closing listening socket: 5
Fri Apr 8 10:57:04 [conn58463] closing listening socket: 6
Fri Apr 8 10:57:04 [conn58463] closing listening socket: 7
Fri Apr 8 10:57:04 [conn58463] closing listening socket: 8
Fri Apr 8 10:57:04 [conn58463] shutdown: going to flush oplog...
Fri Apr 8 10:57:04 [conn58463] shutdown: going to close sockets...
Fri Apr 8 10:57:04 [conn58463] shutdown: waiting for fs preallocator...
Fri Apr 8 10:57:04 [conn58463] shutdown: closing all files...
Fri Apr 8 10:57:04 [conn58463] shutdown: removing fs lock...
Our plans for the immediate future are to upgrade to 1.8 version and start another machine, transforming the system into a replica set - and that is what we are working at right now. But I still want to investigate this problem and be sure that it won't be present after the upgrade.
Notes:
- unfortunately the shutdown I just described is not reproducible - I tried the same sequence of commands and it didn't crash
- as you can see from the log, we use map-reduce too