[SERVER-22224] $near query uses unbounded memory Created: 19/Jan/16  Updated: 13/Oct/23

Status: Backlog
Project: Core Server
Component/s: Geo, Querying
Affects Version/s: 3.0.8, 3.2.1
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Bruce Lucas (Inactive) Assignee: Backlog - Query Execution
Resolution: Unresolved Votes: 2
Labels: RF, WTmem, WTplaybook, query-44-grooming
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File heap-profile-calltree.png     PNG File heap-profile-timeseries.png     PNG File near.png     Text File server22224_explain.log    
Issue Links:
Duplicate
is duplicated by SERVER-32180 mongod oom with low connections Closed
Related
related to SERVER-82167 $regex may use unbounded cpu and memory Investigating
related to SERVER-20239 Built-in sampling heap profiler Closed
is related to SERVER-20239 Built-in sampling heap profiler Closed
Assigned Teams:
Query Execution
Operating System: ALL
Sprint: Query 15 (06/03/16), QE 2022-10-17, QE 2022-10-31, QE 2022-11-14, QE 2022-11-28, QE 2022-12-12, QE 2022-12-26, QE 2023-01-09
Participants:

 Description   

With following index:

    db.c.createIndex({p: "2dsphere", s: 1})

and 10M of the following documents:

    x = ''
    for (i=0; i<1000; i++)
        x += 'x'
    doc = {p: [0,0], s: 0, x: x}

run following query, which finds 0 documents:

    db.c.find({
        p: {
            $near: {
                $geometry: {
                    coordinates: [1, 0],
                    type: "Point"
                },
            },
        },
        s: {$ne: 0}
    }).itcount()
}

Result as follows:

  • peak memory usage is about 13 GB in excess of WT cache
  • query has scanned 20M documents to return 0, in spite of index. Changing query to e.g. s: 1 instead of s: {$ne: 0} eliminates document scan and excess memory usage.
  • less highly skewed point distribution reduces problem presumably because skewed distribution requires accumulating all documents in memory for sort


 Comments   
Comment by Kyle Suarez [ 26/Sep/22 ]

Sending this back to the Query Execution triage queue for re-discussion, and to come up with a SWAG estimate on how much work this would be to implement (in the classic engine). After adding the estimate, we can pass this to the Product team for re-prioritization.

Comment by He Lei [ 24/Apr/18 ]

3.7.5 affected.

Comment by shawn [ 11/Jan/18 ]

is 3.6 affected also?

Comment by Bruce Lucas (Inactive) [ 03/Jun/16 ]

max.hirschhorn, that stack corresponds to memory allocated by WT for the cache, whereas the memory issue in this case will be outside the cache. If you aren't seeing anything signifcant besides that stack, then I suspect that sample was taken after the query finished running, when the excess memory would have already been freed and all that was left was the WT cache.

It can be useful to see a timeseries of call trees, and that can be done using the experimental built-in heap profiler. Running the repro workload above and then visualizing with the timeseries tool:

  • stack222 accounts for the memory in cache. This is the same stack you report above.

    2016-06-03T10:38:25.011-0400 I -        [ftdc] heapProfile stack222: { 0: "realloc", 1: "0x1bd1ea6", 2: "__wt_buf_grow_worker", 3: "__wt_bt_read", 4: "__wt_page_in_func", 5: "__wt_row_search", 6: "__wt_btcur_search", 7: "0x1b99473", 8: "mongo::WiredTigerRecordStore::Cursor::seekExact", 9: "mongo::WorkingSetCommon::fetch", 10: "mongo::FetchStage::work", 11: "mongo::NearStage::bufferNext", 12: "mongo::NearStage::work", 13: "mongo::FetchStage::work", 14: "mongo::PlanExecutor::getNextImpl", 15: "mongo::PlanExecutor::getNext", 16: "mongo::FindCmd::run", 17: "mongo::Command::run", 18: "mongo::Command::execCommand", 19: "mongo::runCommands", 20: "mongo::assembleResponse", 21: "mongo::MyMessageHandler::process", 22: "mongo::PortMessageServer::handleIncomingMsg", 23: "0x7f32ed3346aa", 24: "clone" }
    

  • stack211 accounts for the excess memory outside cache, and it indeed includes makeObjOwnedIfNeeded:

    2016-06-03T10:38:24.801-0400 I -        [conn1] heapProfile stack211: { 0: "tc_malloc", 1: "mongo::mongoMalloc", 2: "mongo::BSONObj::copy", 3: "mongo::BSONObj::getOwned", 4: "mongo::WorkingSetMember::makeObjOwnedIfNeeded", 5: "mongo::NearStage::bufferNext", 6: "mongo::NearStage::work", 7: "mongo::FetchStage::work", 8: "mongo::PlanExecutor::getNextImpl", 9: "mongo::PlanExecutor::getNext", 10: "mongo::FindCmd::run", 11: "mongo::Command::run", 12: "mongo::Command::execCommand", 13: "mongo::runCommands", 14: "mongo::assembleResponse", 15: "mongo::MyMessageHandler::process", 16: "mongo::PortMessageServer::handleIncomingMsg", 17: "0x7f32ed3346aa", 18: "clone" }
    

An alternative view is provided by the calltree tool from the timeseries of call stacks collected by the heap profiler. This view makes it easier to read the stack traces, although it doesn't put it in context with the rest of the metrics like the timeseries tool does. The same two stacks are seen at 1 and 2.

Comment by Max Hirschhorn [ 02/Jun/16 ]

Just to give an update on this ticket. Per a suggestion from kamran.khan, I used malloc_history to see the memory allocation of the server when executing the $near query.

$ malloc_history $(pidof mongod) -callTree
malloc_history Report Version:  2.0
Process:         mongod [18035]
Path:            /Users/maxh/mongodb/3.2.1/mongod
Load Address:    0x10231f000
Identifier:      mongod
Version:         ???
Code Type:       X86-64
Parent Process:  bash [12846]
 
Date/Time:       2016-06-02 11:55:08.527 -0400
OS Version:      Mac OS X 10.9.5 (13F1808)
Report Version:  7
 
Call graph:
    163834 (1124M) << TOTAL >>
      156138 (1069M) Thread_8dbd001
      + 156138 (1069M) thread_start  (in libsystem_pthread.dylib) + 13  [0x7fff88d5efc9]
      +   156138 (1069M) _pthread_start  (in libsystem_pthread.dylib) + 137  [0x7fff88d5a72a]
      +     156138 (1069M) _pthread_body  (in libsystem_pthread.dylib) + 138  [0x7fff88d5a899]
      +       156136 (1069M) void* std::__1::__thread_proxy<std::__1::tuple<std::__1::__bind<void* (*)(void*), mongo::(anonymous namespace)::MessagingPortWithHandler*> > >(void*)  (in mongod) + 97  [0x102bba311]
      +       ! 156124 (1069M) mongo::PortMessageServer::handleIncomingMsg(void*)  (in mongod) + 638  [0x102bbab6e]
      +       ! : 156122 (1069M) mongo::MyMessageHandler::process(mongo::Message&, mongo::AbstractMessagingPort*)  (in mongod) + 255  [0x102328b5f]
      +       ! : | 156115 (1069M) mongo::assembleResponse(mongo::OperationContext*, mongo::Message&, mongo::DbResponse&, mongo::HostAndPort const&)  (in mongod) + 5607  [0x102620eb7]
      +       ! : | + 156115 (1069M) mongo::runCommands(mongo::OperationContext*, mongo::rpc::RequestInterface const&, mongo::rpc::ReplyBuilderInterface*)  (in mongod) + 714  [0x102487c1a]
      +       ! : | +   156115 (1069M) mongo::Command::execCommand(mongo::OperationContext*, mongo::Command*, mongo::rpc::RequestInterface const&, mongo::rpc::ReplyBuilderInterface*)  (in mongod) + 2469  [0x10252db35]
      +       ! : | +     156115 (1069M) mongo::Command::run(mongo::OperationContext*, mongo::rpc::RequestInterface const&, mongo::rpc::ReplyBuilderInterface*)  (in mongod) + 2255  [0x10252eb3f]
      +       ! : | +       156115 (1069M) mongo::(anonymous namespace)::CmdExplain::run(mongo::OperationContext*, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, mongo::BSONObj&, int, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >&, mongo::BSONObjBuilder&)  (in mongod) + 551  [0x1024a19e7]
      +       ! : | +         156111 (1069M) mongo::FindCmd::explain(mongo::OperationContext*, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, mongo::BSONObj const&, mongo::ExplainCommon::Verbosity, mongo::rpc::ServerSelectionMetadata const&, mongo::BSONObjBuilder*) const  (in mongod) + 967  [0x1024a9677]
      +       ! : | +         ! 156111 (1069M) mongo::Explain::explainStages(mongo::PlanExecutor*, mongo::ExplainCommon::Verbosity, mongo::BSONObjBuilder*)  (in mongod) + 290  [0x10273e7f2]
      +       ! : | +         !   156111 (1069M) mongo::PlanExecutor::executePlan()  (in mongod) + 97  [0x102786121]
      +       ! : | +         !     156111 (1069M) mongo::PlanExecutor::getNext(mongo::BSONObj*, mongo::RecordId*)  (in mongod) + 57  [0x102785199]
      +       ! : | +         !       101706 (1067M) mongo::PlanExecutor::getNextImpl(mongo::Snapshotted<mongo::BSONObj>*, mongo::RecordId*)  (in mongod) + 865  [0x102785591]
      +       ! : | +         !       : 101706 (1067M) mongo::FetchStage::work(unsigned long*)  (in mongod) + 421  [0x10255a775]
      +       ! : | +         !       :   101220 (1067M) mongo::NearStage::work(unsigned long*)  (in mongod) + 167  [0x10256ff77]
      +       ! : | +         !       :   | 101220 (1067M) mongo::NearStage::bufferNext(unsigned long*, mongo::Status*)  (in mongod) + 685  [0x10257038d]
      +       ! : | +         !       :   |   100376 (1054M) mongo::FetchStage::work(unsigned long*)  (in mongod) + 893  [0x10255a94d]
      +       ! : | +         !       :   |   + 100376 (1054M) mongo::WorkingSetCommon::fetch(mongo::OperationContext*, mongo::WorkingSet*, unsigned long, mongo::unowned_ptr<mongo::SeekableRecordCursor>)  (in mongod) + 180  [0x10259d804]
      +       ! : | +         !       :   |   +   100375 (1054M) mongo::WiredTigerRecordStore::Cursor::seekExact(mongo::RecordId const&)  (in mongod) + 64  [0x1029ce7f0]
      +       ! : | +         !       :   |   +   ! 100375 (1054M) __curfile_search  (in mongod) + 427  [0x1032e135b]
      +       ! : | +         !       :   |   +   !   100375 (1054M) __wt_btcur_search  (in mongod) + 647  [0x1032a07f7]
      +       ! : | +         !       :   |   +   !     100375 (1054M) __wt_row_search  (in mongod) + 2826  [0x1032c6dfa]
      +       ! : | +         !       :   |   +   !       33487 (1042M) __wt_page_in_func  (in mongod) + 1620  [0x1032ab894]
      +       ! : | +         !       :   |   +   !       : 33343 (1042M) __wt_bt_read  (in mongod) + 752  [0x1032a7ff0]
      +       ! : | +         !       :   |   +   !       : | 33343 (1042M) __wt_buf_grow_worker  (in mongod) + 91  [0x1033417db]
      +       ! : | +         !       :   |   +   !       : |   33343 (1042M) __wt_realloc  (in mongod) + 165  [0x103314225]
      +       ! : | +         !       :   |   +   !       : |     33343 (1042M) realloc  (in libsystem_malloc.dylib) + 209  [0x7fff85f223b6]
      +       ! : | +         !       :   |   +   !       : |       33343 (1042M) malloc_zone_malloc  (in libsystem_malloc.dylib) + 107  [0x7fff85f2188c]

I would have expected WorkingSetMember::makeObjOwnedIfNeeded() to appear in the above stack trace intead of WorkingSetCommon::fetch() and the memory allocations from WiredTiger if it were the case that making copies of documents we aren't going to return was causing this issue. Instead, looking that the explain output of the $near query, we can see that the filter {"$not": {"s": {"$eq": 0}}} on the root FETCH stage and that the bounds for the "s" field are [MinKey, MaxKey].

This means that we will scan the entire index subject to the geo bounds, fetch all of the documents, and then return none of them (in the case of the highly selective filter {"$not": {"s": {"$eq": 0}}}). The next step should be to determine why we aren't unionizing the index bounds with the complement of the index bounds for the predicate {"s": {"$eq": 0}}, i.e. [[MinKey, 0), (0, MaxKey]]

Comment by Bruce Lucas (Inactive) [ 19/Jan/16 ]

Yes, also affects 3.0.8.

Comment by Eric Milkie [ 19/Jan/16 ]

Is 3.0 affected?

Generated at Thu Feb 08 03:59:46 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.