[SERVER-26534] Text search uses excessive memory Created: 08/Oct/16 Updated: 27/Dec/23 |
|
| Status: | Backlog |
| Project: | Core Server |
| Component/s: | Text Search |
| Affects Version/s: | 3.2.1, 3.2.10, 3.4.0-rc0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Bruce Lucas (Inactive) | Assignee: | Backlog - Query Integration |
| Resolution: | Unresolved | Votes: | 11 |
| Labels: | qi-text-search, query-44-grooming, storch | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||||||||||||||||||||||||||||||||||||||||||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||||||||||||||
| Assigned Teams: |
Query Integration
|
||||||||||||||||||||||||||||||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||||||||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||||||||||||||||||||||||||
| Case: | (copied to CRM) | ||||||||||||||||||||||||||||||||||||||||||||||||
| Description |
|
As in
Total memory allocated excluding WT cache is roughly the size of the collection. The top four allocating stacks, accounting for most of the excess:
By experiment it appears that the amount of memory used is proportional (possibly roughly equal in size) to the number of documents returned. |
| Comments |
| Comment by Josef Sábl [ 24/Jul/23 ] |
|
Is this possibly related? |
| Comment by David Storch [ 11/Nov/16 ] |
|
All four stacks which Bruce pasted above are allocations made in order to setup the ScoreMap data structure maintained by the TextOrStage: https://github.com/mongodb/mongo/blob/r3.4.0-rc3/src/mongo/db/exec/text_or.h#L151-L152 This data structure maps from each matching document's RecordId to a pair containing a copy of the corresponding document and its text score. We have to keep a copy of the document since during query yields the storage engine is allowed to free the memory housing the storage subsystem's copy. So it is indeed the case that text queries currently require memory proportional to the size of the result set. This behavior is baked into the current implementation of text search execution. It would require a significant overhaul to fix this in all cases. The good news is that we only need to maintain the ScoreMap structure in order to support computation of text search relevance scores. We hold onto information about documents seen so far so that we can adjust the relevance score when we find a new index key for a document we've already seen. This means that if the query does not request the text score, there is no need to maintain the ScoreMap. This is part of the feature request tracked in related ticket |