[SERVER-26534] Text search uses excessive memory Created: 08/Oct/16  Updated: 27/Dec/23

Status: Backlog
Project: Core Server
Component/s: Text Search
Affects Version/s: 3.2.1, 3.2.10, 3.4.0-rc0
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Bruce Lucas (Inactive) Assignee: Backlog - Query Integration
Resolution: Unresolved Votes: 11
Labels: qi-text-search, query-44-grooming, storch
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File Text_search.png     PNG File text.png    
Issue Links:
Depends
Duplicate
is duplicated by SERVER-26616 mongo out of memory on textual query Closed
is duplicated by SERVER-26923 OOM Killer Terminates All 3 Nodes in ... Closed
Related
related to SERVER-24375 Deduping in OR, SORT_MERGE, and IXSCA... Backlog
related to SERVER-36087 Executing $text statements in conjunc... Closed
related to SERVER-36794 Non-blocking $text plans with just on... Closed
is related to SERVER-79244 Text search with relevance sort consu... Waiting For User Input
is related to SERVER-18926 Full text search extremely slow and u... Closed
is related to SERVER-18961 Avoid iterating the entire working se... Closed
is related to SERVER-26833 Permit non-blocking $text queries whe... Closed
Assigned Teams:
Query Integration
Operating System: ALL
Participants:
Case:

 Description   

As in SERVER-18926 and SERVER-18961 create a collection containing 50 M documents totaling about 4 GB in size with a full-text index, then do a simple search for a single word on the full-text index that returns all the documents.

Total memory allocated excluding WT cache is roughly the size of the collection. The top four allocating stacks, accounting for most of the excess:

heapProfile stack44: { 0: "tc_malloc", 1: "mongo::mongoMalloc", 2: "mongo::BSONObj::copy", 3: "mongo::BSONObj::getOwned", 4: "mongo::WorkingSetMember::makeObjOwnedIfNeeded", 5: "mongo::TextOrStage::addTerm", 6: "mongo::TextOrStage::readFromChildren", 7: "mongo::TextOrStage::work", 8: "mongo::TextMatchStage::work", 9: "mongo::TextStage::work", 10: "mongo::PlanExecutor::getNextImpl", 11: "mongo::PlanExecutor::getNext", 12: "mongo::FindCmd::run", 13: "mongo::Command::run", 14: "mongo::Command::execCommand", 15: "mongo::runCommands", 16: "mongo::assembleResponse", 17: "mongo::MyMessageHandler::process", 18: "mongo::PortMessageServer::handleIncomingMsg", 19: "0x7f89ec7466aa", 20: "clone" }
heapProfile stack41: { 0: "tc_new", 1: "mongo::WorkingSet::allocate", 2: "mongo::IndexScan::work", 3: "mongo::TextOrStage::readFromChildren", 4: "mongo::TextOrStage::work", 5: "mongo::TextMatchStage::work", 6: "mongo::TextStage::work", 7: "mongo::PlanExecutor::getNextImpl", 8: "mongo::PlanExecutor::getNext", 9: "mongo::FindCmd::run", 10: "mongo::Command::run", 11: "mongo::Command::execCommand", 12: "mongo::runCommands", 13: "mongo::assembleResponse", 14: "mongo::MyMessageHandler::process", 15: "mongo::PortMessageServer::handleIncomingMsg", 16: "0x7f89ec7466aa", 17: "clone" }
heapProfile stack46: { 0: "tc_new", 1: "void std::vector<mongo::IndexKeyDatum, std::allocator<mongo::IndexKeyDatum> >::_M_emplace_back_aux<mongo::IndexKeyDatum>", 2: "mongo::IndexScan::work", 3: "mongo::TextOrStage::readFromChildren", 4: "mongo::TextOrStage::work", 5: "mongo::TextMatchStage::work", 6: "mongo::TextStage::work", 7: "mongo::PlanExecutor::getNextImpl", 8: "mongo::PlanExecutor::getNext", 9: "mongo::FindCmd::run", 10: "mongo::Command::run", 11: "mongo::Command::execCommand", 12: "mongo::runCommands", 13: "mongo::assembleResponse", 14: "mongo::MyMessageHandler::process", 15: "mongo::PortMessageServer::handleIncomingMsg", 16: "0x7f89ec7466aa", 17: "clone" }
heapProfile stack45: { 0: "tc_new", 1: "mongo::TextOrStage::addTerm", 2: "mongo::TextOrStage::readFromChildren", 3: "mongo::TextOrStage::work", 4: "mongo::TextMatchStage::work", 5: "mongo::TextStage::work", 6: "mongo::PlanExecutor::getNextImpl", 7: "mongo::PlanExecutor::getNext", 8: "mongo::FindCmd::run", 9: "mongo::Command::run", 10: "mongo::Command::execCommand", 11: "mongo::runCommands", 12: "mongo::assembleResponse", 13: "mongo::MyMessageHandler::process", 14: "mongo::PortMessageServer::handleIncomingMsg", 15: "0x7f89ec7466aa", 16: "clone" }

By experiment it appears that the amount of memory used is proportional (possibly roughly equal in size) to the number of documents returned.



 Comments   
Comment by Josef Sábl [ 24/Jul/23 ]

Is this possibly related?

https://jira.mongodb.org/browse/SERVER-79244

Comment by David Storch [ 11/Nov/16 ]

All four stacks which Bruce pasted above are allocations made in order to setup the ScoreMap data structure maintained by the TextOrStage:

https://github.com/mongodb/mongo/blob/r3.4.0-rc3/src/mongo/db/exec/text_or.h#L151-L152

This data structure maps from each matching document's RecordId to a pair containing a copy of the corresponding document and its text score. We have to keep a copy of the document since during query yields the storage engine is allowed to free the memory housing the storage subsystem's copy. So it is indeed the case that text queries currently require memory proportional to the size of the result set.

This behavior is baked into the current implementation of text search execution. It would require a significant overhaul to fix this in all cases. The good news is that we only need to maintain the ScoreMap structure in order to support computation of text search relevance scores. We hold onto information about documents seen so far so that we can adjust the relevance score when we find a new index key for a document we've already seen. This means that if the query does not request the text score, there is no need to maintain the ScoreMap. This is part of the feature request tracked in related ticket SERVER-26833.

Generated at Thu Feb 08 04:12:26 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.