[SERVER-5471] Crash during map reduce with lots of queued reads Created: 01/Apr/12 Updated: 15/Aug/12 Resolved: 20/Apr/12 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | MapReduce |
| Affects Version/s: | 2.0.4 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Nic Cottrell (Personal) | Assignee: | Tad Marshall |
| Resolution: | Cannot Reproduce | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
RHEL5 |
||
| Attachments: |
|
| Operating System: | Linux |
| Participants: |
| Description |
|
MongoD crashed during the end of a slow map reduce job. |
| Comments |
| Comment by Nic Cottrell (Personal) [ 06/Apr/12 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hi Tad, In fact, I don't know exactly which params were used in that M/R job, but my cron runs very similar queries every hour and I haven't had any similar crashes since then even with the profiler on. So this may in fact be impossible to reproduce Nic. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Tad Marshall [ 06/Apr/12 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hi Nic, Would it be possible to turn off profiling temporarily and try the MapReduce again? This would help us narrow down where the crash is coming from. If the MapReduce produces the desired results and there is no crash, that would be very informative. These two lines from the stack trace are suggesting that it is the profiler that caused the crash, not the MapReduce:
Once we know for sure that it is the profiler that is crashing (if that turns out to be the case), we will know where to focus debugging. Thanks! Tad | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Nic Cottrell (Personal) [ 03/Apr/12 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hi Tad, Yeah, I use INLINE as output for almost everything since it seems much faster and I'm usually well within the 16MB limit. I install from your official yum repo for linux - and am pretty sure it was 2.0.4 at the time of the crash. Yeah, I have profiling enabled, in fact I have: quiet = true
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Tad Marshall [ 03/Apr/12 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hi Nic, Thanks for the info. I'm used to seeing MapReduce expressed in JavaScript, in the mongo shell, for example. The syntax for issuing a MapReduce command from the shell is described at http://www.mongodb.org/display/DOCS/MapReduce#MapReduce-Overview and this is a kind of view into the way that MapReduce is seen by the MongoDB server. MapReduce is a server command, which is received as a query against a pseudo-collection named "$cmd". None of this matters to your issue, but for the server to have executed a MapReduce operation for you it had to have received a command telling it to do that. This command could have had options, perhaps including an optional "out" parameter that controls what happens to the output of the MapReduce operation. Can you post the Java code that is invoking the MapReduce operation? That might give us an idea of how it could be structured to avoid the memory limit. You mentioned "inline" ... as documented at http://www.mongodb.org/display/DOCS/MapReduce#MapReduce-Outputoptions , "Note that this option is possible only when the result set fits within the 16MB limit of a single document." Regarding the crash, are you running a build downloaded from mongodb.org or one from another source? Also, I see mention of "profile" and "OpDebug" in the stack trace, and I'm wondering if you had profiling enabled when the crash occurred. It is possible (if profiling was active) that the actual crash wasn't in the MapReduce operation but in the profiler attempting to log the long-running operation. The stack traces in your mongod-crash.txt log show no indication of JavaScript. If we know that the version you were running is exactly our 2.0.4 Linux version from mongodb.org, then we can use the addresses from the stack to see exactly where we were when we crashed. Thanks! Tad | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Nic Cottrell (Personal) [ 03/Apr/12 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hi Tad! I'm not sure what you mean by the MapReduce command. I'm using the Java library via Morphia - which uses the MapReduceCommand class I believe. I don't know exactly what was going on during the crash - it was some batch process running at the time. I believe the query would have been something like: {"indices.text": {$in: [.....]}, lc: "eng", group: "all"} and the scope field params would have been {"targetTerms": [...]}and the [...] array would have been 500-1000 short strings probably all sorts of high unicode. Here's an example object from that collection:
Hope this helps! | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Tad Marshall [ 02/Apr/12 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Can you post the mapReduce command itself, including all arguments, and also post a sample of one of the documents being found by your query (and used in the map function)? There is a separate problem in that the server crashed after reporting the BSON size assertion twice: trying to generate a BSON object that is too large should not crash the server and does not do this in other cases. So, we like to help you get a working map/reduce and also find out what caused the crash. Thanks! | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Nic Cottrell (Personal) [ 02/Apr/12 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Yeah - it's quite likely that I hit some limit for an inline map reduce, but each object in the result set should have been really small. I think this was the map reduce code running at the crash: map = function () { } reduce = function (prev, curr) { return result; The candidate set from the query was about 200k objects, average object size in that collection is about 2487. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Tad Marshall [ 01/Apr/12 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
It looks like the map/reduce exceeded the maximum size of a BSON object: Sun Apr 1 00:56:56 [conn920] Assertion: 13548:BufBuilder grow() > 64MB Is this a reasonable result of your map/reduce job (to produce a document of this size at some point)? If so, you have hit a built-in limit. If not, can you post your map/reduce code and/or a description of what you are doing so we could see where the problem might be? |