[SERVER-2280] Several exceptions running JavaScript Man/Reduce job Created: 22/Dec/10  Updated: 12/Jul/16  Resolved: 27/Dec/10

Status: Closed
Project: Core Server
Component/s: JavaScript, Stability
Affects Version/s: 1.6.5
Fix Version/s: 1.7.5

Type: Bug Priority: Major - P3
Reporter: Jim Powers Assignee: Unassigned
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Linux qa-mongo1 2.6.21.7-2.fc8xen #1 SMP Fri Feb 15 12:34:28 EST 2008 x86_64 GNU/Linux
Amazon EC2 VM


Attachments: File tweets_by_region2.js    
Operating System: Linux
Participants:

 Description   

Version data:
db version v1.6.6-pre-, pdfile version 4.5
git hash: nogitversion
sys info: Linux bobek-a0 2.6.32-5-amd64 #1 SMP Thu Nov 25 18:02:11 UTC 2010 x86_64 BOOST_LIB_VERSION=1_42
uptime: 1011 seconds

Trying to run the following map/reduce job from the mongo console:
m = function() {
var self = this;
this.regionIds.forEach(function(rid) {
emit(rid,

{tweets: [new DBRef('tweets',self._id)]}

);
});
};

r = function(k,vals) {
var tweets = [];
for (var i in vals)

{ tweets = tweets.concat(vals[i].tweets); }

tweets.sort(createdAtComparer);
if (tweets > maxTweets)

{ tweets.length = maxTweets; }

return

{"tweets": tweets}

;
};

options = {
out: "tweetsByRegion",
scope: { maxTweets: 50,
createdAtComparer: function(a,b) {
return b.fetch().createdAt - a.fetch().createdAt;
}}
};

res = db.tweets.mapReduce(m,r,options);

As you can guess we are processing geo-located tweets where each tweet has a list of "region ids" associated with them. The Map/Reduce job basically inverts the relationship to group tweets (up to 50) by the region that they appear in. Also, note that I'm using a DBRef and not including the tweet directly. The only fields accesses are _id, and createdAt via a fetch().

The errors I have gotten are (and the errors are not always the same):

Wed Dec 22 15:00:30 uncaught exception: map reduce failed: {
"assertion" : "assertion scripting/engine_spidermonkey.cpp:286",
"errmsg" : "db assertion failure",
"ok" : 0
}

And

Wed Dec 22 14:36:22 uncaught exception: map reduce failed: {
"assertion" : "assertion scripting/engine_spidermonkey.cpp:512",
"errmsg" : "db assertion failure",
"ok" : 0
}

One seems to indicate missing "properties" and the other some kind of character encoding issue.

Here's an example record:

{
"_id" : NumberLong("17320712503037953"),
"user" :

{ "id" : NumberLong(214251894), "screenName" : "BicycleRobin", "profileImageUrl" : "http://a2.twimg.com/profile_images/1164994911/madbee_normal.jpg", "createdAt" : "Wed Nov 10 2010 18:24:16 GMT-0500 (EST)" }

,
"regionIds" : [
100888,
2635458,
113373,
2635448,
113299,
17641,
113322
],
"text" : "Hey Guy skaha is dead calm http://myloc.me/fxtc1",
"original" : "{\"in_reply_to_status_id_str\":null,\"place\":{\"country_code\":\"\",\"url\":\"http:\\/\\/api.twitter.com\\/1\\/geo\\/id\\/89436cc68723693a.json\",\"bounding_box\":

{\"type\":\"Polygon\",\"coordinates\":[[[-121.097983,48.999419],[-119.17567,48.999419],[-119.17567,49.912],[-121.097983,49.912]]]}

,\"attributes\":{},\"full_name\":\"Okanagan-Similkameen, British Columbia\",\"country\":\"Canada\",\"name\":\"Okanagan-Similkameen\",\"id\":\"89436cc68723693a\",\"place_type\":\"city\"},\"in_reply_to_user_id\":null,\"text\":\"Hey Guy skaha is dead calm http:\\/\\/myloc.me\\/fxtc1\",\"contributors\":null,\"coordinates\":

{\"type\":\"Point\",\"coordinates\":[-119.61187,49.44703]}

,\"retweet_count\":0,\"in_reply_to_user_id_str\":null,\"id_str\":\"17320712503037953\",\"retweeted\":false,\"in_reply_to_status_id\":null,\"source\":\"
u003Ca href=\\\"http:\\/\\/www.ubertwitter.com\\/bb\\/download.php\\\" rel=\\\"nofollow\\\"\\u003E\\u00dcberTwitter\\u003C\\/a\\u003E\",\"created_at\":\"Tue Dec 21 20:49:14 +0000 2010\",\"truncated\":false,\"geo\":

{\"type\":\"Point\",\"coordinates\":[49.44703,-119.61187]}

,\"favorited\":false,\"user\":

{\"profile_link_color\":\"0084B4\",\"location\":\"\\u00dcT: 49.40707,-119.60601\",\"verified\":false,\"favourites_count\":0,\"profile_sidebar_border_color\":\"C0DEED\",\"id_str\":\"214251894\",\"friends_count\":5,\"is_translator\":false,\"show_all_inline_media\":false,\"geo_enabled\":true,\"profile_use_background_image\":true,\"description\":\"kayak and bicycle enthusiast\",\"contributors_enabled\":false,\"profile_background_color\":\"C0DEED\",\"url\":\"http:\\/\\/rockinrobin.posterous.com\",\"profile_image_url\":\"http:\\/\\/a2.twimg.com\\/profile_images\\/1164994911\\/madbee_normal.jpg\",\"profile_background_image_url\":\"http:\\/\\/a3.twimg.com\\/a\\/1292883740\\/images\\/themes\\/theme1\\/bg.png\",\"created_at\":\"Wed Nov 10 23:24:16 +0000 2010\",\"followers_count\":6,\"follow_request_sent\":null,\"screen_name\":\"BicycleRobin\",\"profile_text_color\":\"333333\",\"protected\":false,\"lang\":\"en\",\"statuses_count\":37,\"notifications\":null,\"profile_sidebar_fill_color\":\"DDEEF6\",\"name\":\"Robin Dunham\",\"following\":null,\"profile_background_tile\":false,\"time_zone\":\"Pacific Time (US & Canada)\",\"id\":214251894,\"listed_count\":5,\"utc_offset\":-28800}

,\"id\":17320712503037953,\"entities\":{\"urls\":[

{\"indices\":[27,48],\"expanded_url\":null,\"url\":\"http:\\/\\/myloc.me\\/fxtc1\"}

],\"hashtags\":[],\"user_mentions\":[]},\"in_reply_to_screen_name\":null}",
"createdAt" : "Tue Dec 21 2010 15:49:14 GMT-0500 (EST)",
"source" : "<a href=\"http://www.ubertwitter.com/bb/download.php\" rel=\"nofollow\">ÜberTwitter</a>",
"entities" : {
"urls" : [

{ "url" : "http://myloc.me/fxtc1", "start" : 27, "end" : 48 }

]
}
}

The number of records being operated on is about 65K (not a lot really).

The collection, 'tweets', is being actively written to while the M/R job runs.

Help.



 Comments   
Comment by Eliot Horowitz (Inactive) [ 27/Dec/10 ]

If instead of concatenating the lists, then sorting, then trimming, if you do an an incremental merge of the incoming lists, you'll never have a large temporary result.

an inefficient version would be

r = function(k,vals) {
var tweets = [];
for (var i=0; i<vals.length; i++ ) {
for ( var j=0; j<vals[i].length; j++ ){
tweets.push( vals[i][j] );
tweets.sort(createdAtComparer);
if (tweets > maxTweets)

{ tweets.pop() }

}
return

{"tweets": tweets}

;
};

Comment by Jim Powers [ 27/Dec/10 ]

Yes, the map phase unrolls the region mapping, can can produce potentially large arrays. However, I thought that the idempotent characteristic of the reduce function could be used to deal with this.

Assume, for instance, that there is an internal cap on "reduce object size" of say N bytes. After that point the "reduce object" is split into multiple parts with the same key. Each part no larger than N bytes. At least this way the map/reduce can continue assuming that the reduce function does not produce large objects as a result.

Oh well. I have started to write this in Scala anyway. However, it appears that the "straight JavaScript" version runs much more slowly than the map/reduce version. CPU is very high. The scala version also runs with a lot of CPU while Mongo runs with almost none.

Comment by Eliot Horowitz (Inactive) [ 27/Dec/10 ]

In the reduce above, it seems that you first concatenate all inputs into 1 array, and then limit to 50.
The input list can be large, so the resulting list might be very large before being trimmed.

If you can send a script that fully reproduces the issue, can see if something else is going on.

Comment by Jim Powers [ 27/Dec/10 ]

Attached is a straight javascript script that does what the map/reduce program does. The array sizes are capped at a maximum of 50, which should not be considered huge IMHO.

Comment by auto [ 27/Dec/10 ]

Author:

{u'login': u'erh', u'name': u'Eliot Horowitz', u'email': u'eliot@10gen.com'}

Message: better js error message on out of memory SERVER-2280
https://github.com/mongodb/mongo/commit/57be437dd82e21844d0ed5248a4b6f8f9ac5b61e

Comment by Eliot Horowitz (Inactive) [ 27/Dec/10 ]

In your map/reduce you're building up a huge array.
You're running out of memory doing so.
You can't use map/reduce in that way.
You need to build large lists like that client side
so I would make your keys compound so you can easily query for groups.

Resolving as I made the error message better.

Generated at Thu Feb 08 02:59:29 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.