[SERVER-8287] Large array result causes v8 shell to run out of memory Created: 23/Jan/13  Updated: 11/Jul/16  Resolved: 04/Feb/13

Status: Closed
Project: Core Server
Component/s: JavaScript, Shell
Affects Version/s: 2.3.2
Fix Version/s: 2.4.0-rc1

Type: Bug Priority: Critical - P2
Reporter: Ben Becker Assignee: Ben Becker
Resolution: Done Votes: 0
Labels: javascript, shell
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File diff    
Issue Links:
Related
is related to SERVER-8125 Text search with large result set rai... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Participants:

 Description   

The following script causes v8 to run out of memory with the current resource constraints. When the constraints are removed, the shell uses ~750mb of resident memory. Subsequent runs of the GC only free ~640mb.

> db.foo.ensureIndex({a:"text"})
> for (i=0; i<100000; i++) {db.foo.insert({a:"word",b:"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"})}
> db.foo.runCommand("text", {search:"word", limit:100000})
Wed Jan  9 14:30:43 uncaught exception: error {
	"$err" : "BSONObj size: 16872191 (0xFF720101) is invalid. Size must be between 0 and 16793600(16MB) First element: queryDebugString: \"word||||||\"",
	"code" : 10334
}



 Comments   
Comment by auto [ 04/Feb/13 ]

Author:

{u'date': u'2013-02-04T01:14:22Z', u'name': u'Ben Becker', u'email': u'ben.becker@10gen.com'}

Message: SERVER-8287: reduce excessive memory consumption by the shell's tojson() helper
Branch: master
https://github.com/mongodb/mongo/commit/cf8c09559ddc07c6d70d75f128d867f4b4786d80

Comment by Ben Becker [ 23/Jan/13 ]

Patch fixes the issue, even with our current resource constraints. Mem usage in the shell (vs. NodeJS) is nearly identical – 138mb.

It may be worth converting these to native functions at some point to further reduce memory consumption.

Comment by Aaron Heckmann [ 23/Jan/13 ]

Attached is the diff used to produce better results - res mem sat around 130MB consistently.

Summary: string concatenation in v8 is very bad. Use an array and .join() it instead.

Comment by Ben Becker [ 23/Jan/13 ]

As aheckmann pointed out, string concatenation (+=) appears to be very expensive in terms of memory and CPU time. Commenting out the loop that replaces escaped characters in tojson() decreased total mem consumption from 750mb to 200mb.

Comment by Ben Becker [ 23/Jan/13 ]

The source of memory consumption is our tojson() and tojsonObject() JS functions (called from shellPrintHelper(_lastres_) in dbshell.cpp).

Retrieving the results without displaying them only results in ~60mb resident memory. Resident mem ramps up to ~750mb while executing the tojson/tojsonObject functions, and the GC starts thrashing; moving objects from young to old space while compacting.

In debug mode, printing the results takes 32 seconds (vs. ~1.5 seconds to retrieve results). Replacing the shellPrintHelper with a much simpler function completes in less than a second with no noticeable increase in memory consumption. This function doesn't do all the required work though.

Cursory scanning of the tojson() functions reviled several expensive O(n) operations which are likely to produce thousands of 'young' objects for each result in this test. For example, every string is iterated over to replace escaped characters. Another example is counting an Object's properties before displaying them.

Comment by Ben Becker [ 23/Jan/13 ]

One work-around is to remove (or significantly increase) the resource limits we set for each isolate. In a local test, the mongo client uses ~750mb of resident memory when resource constraints are lifted. gc() reduces consumption to 112mb (much higher than the 10.1mb we start with). Seems like we're allocating too much memory, and GC isn't freeing everything it can, even after multiple runs.

Generated at Thu Feb 08 03:17:01 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.