[COMPASS-1663] Investigate schema analysis consumes a lot of memory Created: 04/Aug/17 Updated: 10/Jan/24 Resolved: 09/Jan/20 |
|
| Status: | Closed |
| Project: | Compass |
| Component/s: | Performance |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Task | Priority: | Major - P3 |
| Reporter: | Satyendra Sinha | Assignee: | Unassigned |
| Resolution: | Gone away | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||
| Issue Links: |
|
||||
| Epic Link: | COMPASS-2234 | ||||
| Description |
|
Certain collections consumes a lot of memory during schema analysis as HELP-4695 indicates. This ticket is to track and investigate the cause for higher memory consumption in some collections and if there are any potential solutions. Reproduction steps with mgenerate file attached to this ticket. |
| Comments |
| Comment by Brian Blevins [ 16/Aug/17 ] | |||||||||
|
Hi sam.weaver,
I have not investigated the memory utilization against number of documents in the collection. My expectation is that the complexity of the documents is a much larger factor than the count for driving memory utilization. This is one difficulty in troubleshooting because we do not know the precise layout of the customer documents in most cases. Schema analysis time complexity is impacted by the network latency. You can see some results in my internal only comment in 00448175: Compass is Unusable. Time/delay is also impacted by whether the sample size is greater than or less than 5% of the collection size because this drives whether Compass uses $sample versus reservoir sampling. Regards, | |||||||||
| Comment by Brian Blevins [ 15/Aug/17 ] | |||||||||
|
Hi thomasr, The customer in 00432392: Using too much memory for large collection is Memorial Sloan-Kettering Cancer Center. The complaint is about the total Compass memory usage on a Windows 7 desktop. However, that customer is not able to provide any sample data due to HIIPA rules. So, our testing is not apples-to-apples with the customer's dataset. Additionally, the customer appears to be running Compass on an 8 GB RAM desktop. I have asked the customer to reduce the sample size and that request has been ignored. The sample data provided actually came from complex sample documents provided by a different customer where I noticed memory usage was also high. See 00448175: Compass is Unusable.
The customer at "Memorial Sloan-Kettering Cancer Center" opened the support case with:
The latest update from the customer regarding Compass schema analysis:
| |||||||||
| Comment by Sam Weaver [ 15/Aug/17 ] | |||||||||
|
What is the consumption like when you double the number to 50,000 documents? Does 50,000 docs consume 880MB? Is it exponential growth? Does this only happen on "schema analysis" or on any other screen? What about just showing the docs in the Documents tab? | |||||||||
| Comment by Thomas Rueckstiess [ 15/Aug/17 ] | |||||||||
|
brian.blevins we haven't made any significant progress on this issue yet. What is the impact on the customers? Are they just curious why it's using that much memory? Or are they unable to use Compass because of this? | |||||||||
| Comment by Thomas Rueckstiess [ 15/Aug/17 ] | |||||||||
|
I recreated the collection with 25,000 documents with the template memory_issue.json
Some more diagnostic info from mongodb-schema:
A single run with mongodb-schema consumed about 440MB of memory in the process (as observed in Activity Monitor):
|