[COMPASS-1663] Investigate schema analysis consumes a lot of memory Created: 04/Aug/17  Updated: 10/Jan/24  Resolved: 09/Jan/20

Status: Closed
Project: Compass
Component/s: Performance
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: Satyendra Sinha Assignee: Unassigned
Resolution: Gone away Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: JPEG File MongoDB.COMPASS.Task.Manager.20170727.1551.JPG     JPEG File MongoDB.COMPASS.Task.Manager.20170727.1558.JPG     PNG File compass-1663-memory.png     File memory_issue.json    
Issue Links:
Related
Epic Link: COMPASS-2234

 Description   

Certain collections consumes a lot of memory during schema analysis as HELP-4695 indicates.

This ticket is to track and investigate the cause for higher memory consumption in some collections and if there are any potential solutions.

Reproduction steps with mgenerate file attached to this ticket.



 Comments   
Comment by Brian Blevins [ 16/Aug/17 ]

Hi sam.weaver,

What is the consumption like when you double the number to 50,000 documents?

I have not investigated the memory utilization against number of documents in the collection. My expectation is that the complexity of the documents is a much larger factor than the count for driving memory utilization. This is one difficulty in troubleshooting because we do not know the precise layout of the customer documents in most cases.

Schema analysis time complexity is impacted by the network latency. You can see some results in my internal only comment in 00448175: Compass is Unusable. Time/delay is also impacted by whether the sample size is greater than or less than 5% of the collection size because this drives whether Compass uses $sample versus reservoir sampling.

Regards,
Brian

Comment by Brian Blevins [ 15/Aug/17 ]

Hi thomasr,

The customer in 00432392: Using too much memory for large collection is Memorial Sloan-Kettering Cancer Center.

The complaint is about the total Compass memory usage on a Windows 7 desktop.

However, that customer is not able to provide any sample data due to HIIPA rules. So, our testing is not apples-to-apples with the customer's dataset.

Additionally, the customer appears to be running Compass on an 8 GB RAM desktop.

I have asked the customer to reduce the sample size and that request has been ignored.

The sample data provided actually came from complex sample documents provided by a different customer where I noticed memory usage was also high. See 00448175: Compass is Unusable.

What is the impact on the customers? Or are they unable to use Compass because of this?

The customer at "Memorial Sloan-Kettering Cancer Center" opened the support case with:

Secondly, from the user's workstation perspective, COMPASS seems to be a very memory intensive application. Is there any plan to address this in the future? We see the benefit of COMPASS. However, due to the limitations outlined above, our developers are not using it as much.

The latest update from the customer regarding Compass schema analysis:

COMPASS came back after ~10 minutes but it did not hang this time.

Comment by Sam Weaver [ 15/Aug/17 ]

What is the consumption like when you double the number to 50,000 documents? Does 50,000 docs consume 880MB? Is it exponential growth? Does this only happen on "schema analysis" or on any other screen? What about just showing the docs in the Documents tab?

Comment by Thomas Rueckstiess [ 15/Aug/17 ]

brian.blevins we haven't made any significant progress on this issue yet. What is the impact on the customers? Are they just curious why it's using that much memory? Or are they unable to use Compass because of this?

Comment by Thomas Rueckstiess [ 15/Aug/17 ]

I recreated the collection with 25,000 documents with the template memory_issue.json.

mgeneratejs memory_issue.json -n 25000 | mongoimport -d perf -c compass_1663_memory

Some more diagnostic info from mongodb-schema:

mongodb-schema localhost:27017 perf.compass_1663_memory -n 1000 --no-output --stats --repeat 10
execution count: 10
mean time: 23717.50ms (individual results: 23002,24164,24283,22122,23590,24701,23264,23509,23711,24829)
stdev time: 777.49ms
toplevel fields: 50
branching factors: [83,50,9,3,3]
schema width: 148
schema depth: 3

A single run with mongodb-schema consumed about 440MB of memory in the process (as observed in Activity Monitor):

Generated at Wed Feb 07 22:28:02 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.