[SERVER-2280] Several exceptions running JavaScript Man/Reduce job Created: 22/Dec/10 Updated: 12/Jul/16 Resolved: 27/Dec/10 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | JavaScript, Stability |
| Affects Version/s: | 1.6.5 |
| Fix Version/s: | 1.7.5 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Jim Powers | Assignee: | Unassigned |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Linux qa-mongo1 2.6.21.7-2.fc8xen #1 SMP Fri Feb 15 12:34:28 EST 2008 x86_64 GNU/Linux |
||
| Attachments: |
|
| Operating System: | Linux |
| Participants: |
| Description |
|
Version data: Trying to run the following map/reduce job from the mongo console: ); r = function(k,vals) { tweets.sort(createdAtComparer); return {"tweets": tweets}; options = { res = db.tweets.mapReduce(m,r,options); As you can guess we are processing geo-located tweets where each tweet has a list of "region ids" associated with them. The Map/Reduce job basically inverts the relationship to group tweets (up to 50) by the region that they appear in. Also, note that I'm using a DBRef and not including the tweet directly. The only fields accesses are _id, and createdAt via a fetch(). The errors I have gotten are (and the errors are not always the same): Wed Dec 22 15:00:30 uncaught exception: map reduce failed: { And Wed Dec 22 14:36:22 uncaught exception: map reduce failed: { One seems to indicate missing "properties" and the other some kind of character encoding issue. Here's an example record: { , ,\"attributes\":{},\"full_name\":\"Okanagan-Similkameen, British Columbia\",\"country\":\"Canada\",\"name\":\"Okanagan-Similkameen\",\"id\":\"89436cc68723693a\",\"place_type\":\"city\"},\"in_reply_to_user_id\":null,\"text\":\"Hey Guy skaha is dead calm http:\\/\\/myloc.me\\/fxtc1\",\"contributors\":null,\"coordinates\": {\"type\":\"Point\",\"coordinates\":[-119.61187,49.44703]},\"retweet_count\":0,\"in_reply_to_user_id_str\":null,\"id_str\":\"17320712503037953\",\"retweeted\":false,\"in_reply_to_status_id\":null,\"source\":\" ,\"favorited\":false,\"user\": {\"profile_link_color\":\"0084B4\",\"location\":\"\\u00dcT: 49.40707,-119.60601\",\"verified\":false,\"favourites_count\":0,\"profile_sidebar_border_color\":\"C0DEED\",\"id_str\":\"214251894\",\"friends_count\":5,\"is_translator\":false,\"show_all_inline_media\":false,\"geo_enabled\":true,\"profile_use_background_image\":true,\"description\":\"kayak and bicycle enthusiast\",\"contributors_enabled\":false,\"profile_background_color\":\"C0DEED\",\"url\":\"http:\\/\\/rockinrobin.posterous.com\",\"profile_image_url\":\"http:\\/\\/a2.twimg.com\\/profile_images\\/1164994911\\/madbee_normal.jpg\",\"profile_background_image_url\":\"http:\\/\\/a3.twimg.com\\/a\\/1292883740\\/images\\/themes\\/theme1\\/bg.png\",\"created_at\":\"Wed Nov 10 23:24:16 +0000 2010\",\"followers_count\":6,\"follow_request_sent\":null,\"screen_name\":\"BicycleRobin\",\"profile_text_color\":\"333333\",\"protected\":false,\"lang\":\"en\",\"statuses_count\":37,\"notifications\":null,\"profile_sidebar_fill_color\":\"DDEEF6\",\"name\":\"Robin Dunham\",\"following\":null,\"profile_background_tile\":false,\"time_zone\":\"Pacific Time (US & Canada)\",\"id\":214251894,\"listed_count\":5,\"utc_offset\":-28800},\"id\":17320712503037953,\"entities\":{\"urls\":[ {\"indices\":[27,48],\"expanded_url\":null,\"url\":\"http:\\/\\/myloc.me\\/fxtc1\"}],\"hashtags\":[],\"user_mentions\":[]},\"in_reply_to_screen_name\":null}", ] The number of records being operated on is about 65K (not a lot really). The collection, 'tweets', is being actively written to while the M/R job runs. Help. |
| Comments |
| Comment by Eliot Horowitz (Inactive) [ 27/Dec/10 ] |
|
If instead of concatenating the lists, then sorting, then trimming, if you do an an incremental merge of the incoming lists, you'll never have a large temporary result. an inefficient version would be r = function(k,vals) { } ; |
| Comment by Jim Powers [ 27/Dec/10 ] |
|
Yes, the map phase unrolls the region mapping, can can produce potentially large arrays. However, I thought that the idempotent characteristic of the reduce function could be used to deal with this. Assume, for instance, that there is an internal cap on "reduce object size" of say N bytes. After that point the "reduce object" is split into multiple parts with the same key. Each part no larger than N bytes. At least this way the map/reduce can continue assuming that the reduce function does not produce large objects as a result. Oh well. I have started to write this in Scala anyway. However, it appears that the "straight JavaScript" version runs much more slowly than the map/reduce version. CPU is very high. The scala version also runs with a lot of CPU while Mongo runs with almost none. |
| Comment by Eliot Horowitz (Inactive) [ 27/Dec/10 ] |
|
In the reduce above, it seems that you first concatenate all inputs into 1 array, and then limit to 50. If you can send a script that fully reproduces the issue, can see if something else is going on. |
| Comment by Jim Powers [ 27/Dec/10 ] |
|
Attached is a straight javascript script that does what the map/reduce program does. The array sizes are capped at a maximum of 50, which should not be considered huge IMHO. |
| Comment by auto [ 27/Dec/10 ] |
|
Author: {u'login': u'erh', u'name': u'Eliot Horowitz', u'email': u'eliot@10gen.com'}Message: better js error message on out of memory |
| Comment by Eliot Horowitz (Inactive) [ 27/Dec/10 ] |
|
In your map/reduce you're building up a huge array. Resolving as I made the error message better. |