[SERVER-16035] Leaks emanating from QueryPlannerAccess::makeCollectionScan Created: 07/Nov/14  Updated: 18/Jun/15  Resolved: 18/Jun/15

Status: Closed
Project: Core Server
Component/s: Querying
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Kamran K. Assignee: J Rassi
Resolution: Duplicate Votes: 0
Labels: 28qa
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
duplicates SERVER-19040 WhereMatchExpression::shallowClone ca... Closed
Related
is related to SERVER-16889 Query subsystem public API should use... Closed
Operating System: ALL
Participants:

 Description   

I don't have repro steps (yet), but I noticed these leaks when running tests:

Valgrind trace:

==8878== 132 (80 direct, 52 indirect) bytes in 1 blocks are definitely lost in loss record 6,548 of 7,437
==8878==    at 0x4C2B2C0: operator new(unsigned long) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==8878==    by 0x146CD02: mongo::WhereMatchExpression::shallowClone() const (expression_where.cpp:67)
==8878==    by 0x157EC64: mongo::QueryPlannerAccess::makeCollectionScan(mongo::CanonicalQuery const&, bool, mongo::QueryPlannerParams const&) (planner_access.cpp:72)
==8878==    by 0x158EF76: mongo::buildCollscanSoln(mongo::CanonicalQuery const&, bool, mongo::QueryPlannerParams const&) (query_planner.cpp:221)
==8878==    by 0x15933E2: mongo::QueryPlanner::plan(mongo::CanonicalQuery const&, mongo::QueryPlannerParams const&, std::vector<mongo::QuerySolution*, std::allocator<mongo::QuerySolution*> >*) (query_planner.cpp:922)
==8878==    by 0x153FA85: mongo::(anonymous namespace)::prepareExecution(mongo::OperationContext*, mongo::Collection*, mongo::WorkingSet*, mongo::CanonicalQuery*, unsigned long, mongo::PlanStage**, mongo::QuerySolution**) (get_executor.cpp:311)
==8878==    by 0x15407B8: mongo::getExecutor(mongo::OperationContext*, mongo::Collection*, mongo::CanonicalQuery*, mongo::PlanExecutor::YieldPolicy, mongo::PlanExecutor**, unsigned long) (get_executor.cpp:400)
==8878==    by 0x12BE6FF: mongo::CmdFindAndModify::runNoDirectClient(mongo::OperationContext*, std::string const&, mongo::BSONObj const&, mongo::BSONObj const&, mongo::BSONObj const&, bool, bool, bool, mongo::BSONObjBuilder&, std::string&) (find_and_modify.cpp:206)
==8878==    by 0x12BDD6F: mongo::CmdFindAndModify::runNoDirectClient(mongo::OperationContext*, std::string const&, mongo::BSONObj&, int, std::string&, mongo::BSONObjBuilder&, bool) (find_and_modify.cpp:110)
==8878==    by 0x12BFB8D: mongo::CmdFindAndModify::run(mongo::OperationContext*, std::string const&, mongo::BSONObj&, int, std::string&, mongo::BSONObjBuilder&, bool) (find_and_modify.cpp:368)
==8878==    by 0x1337E26: mongo::_execCommand(mongo::OperationContext*, mongo::Command*, std::string const&, mongo::BSONObj&, int, std::string&, mongo::BSONObjBuilder&, bool) (dbcommands.cpp:1160)
==8878==    by 0x1338DDB: mongo::Command::execCommand(mongo::OperationContext*, mongo::Command*, int, char const*, mongo::BSONObj&, mongo::BSONObjBuilder&, bool) (dbcommands.cpp:1374)
==8878==    by 0x13396E8: mongo::_runCommands(mongo::OperationContext*, char const*, mongo::BSONObj&, mongo::_BufBuilder<mongo::TrivialAllocator>&, mongo::BSONObjBuilder&, bool, int) (dbcommands.cpp:1450)
==8878==    by 0x1556AB8: mongo::runCommands(mongo::OperationContext*, char const*, mongo::BSONObj&, mongo::CurOp&, mongo::_BufBuilder<mongo::TrivialAllocator>&, mongo::BSONObjBuilder&, bool, int) (new_find.cpp:131)
==8878==    by 0x1558745: mongo::newRunQuery(mongo::OperationContext*, mongo::Message&, mongo::QueryMessage&, mongo::CurOp&, mongo::Message&, bool) (new_find.cpp:552)
==8878==    by 0x14444FA: mongo::receivedQuery(mongo::OperationContext*, mongo::Client&, mongo::DbResponse&, mongo::Message&, bool) (instance.cpp:220)
==8878==    by 0x144561D: mongo::assembleResponse(mongo::OperationContext*, mongo::Message&, mongo::DbResponse&, mongo::HostAndPort const&, bool) (instance.cpp:393)
==8878==    by 0x114CBCB: mongo::MyMessageHandler::process(mongo::Message&, mongo::AbstractMessagingPort*, mongo::LastError*) (db.cpp:185)
==8878==    by 0x18FB872: mongo::PortMessageServer::handleIncomingMsg(void*) (message_server_port.cpp:234)
==8878==    by 0x4E3F181: start_thread (pthread_create.c:312)
==8878==    by 0x5D7BFBC: clone (clone.S:111)
==8878== 
==8878== 161 (120 direct, 41 indirect) bytes in 1 blocks are definitely lost in loss record 6,603 of 7,437
==8878==    at 0x4C2B2C0: operator new(unsigned long) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==8878==    by 0x157EC1B: mongo::QueryPlannerAccess::makeCollectionScan(mongo::CanonicalQuery const&, bool, mongo::QueryPlannerParams const&) (planner_access.cpp:70)
==8878==    by 0x158EF76: mongo::buildCollscanSoln(mongo::CanonicalQuery const&, bool, mongo::QueryPlannerParams const&) (query_planner.cpp:221)
==8878==    by 0x15933E2: mongo::QueryPlanner::plan(mongo::CanonicalQuery const&, mongo::QueryPlannerParams const&, std::vector<mongo::QuerySolution*, std::allocator<mongo::QuerySolution*> >*) (query_planner.cpp:922)
==8878==    by 0x153FA85: mongo::(anonymous namespace)::prepareExecution(mongo::OperationContext*, mongo::Collection*, mongo::WorkingSet*, mongo::CanonicalQuery*, unsigned long, mongo::PlanStage**, mongo::QuerySolution**) (get_executor.cpp:311)
==8878==    by 0x15407B8: mongo::getExecutor(mongo::OperationContext*, mongo::Collection*, mongo::CanonicalQuery*, mongo::PlanExecutor::YieldPolicy, mongo::PlanExecutor**, unsigned long) (get_executor.cpp:400)
==8878==    by 0x12BE6FF: mongo::CmdFindAndModify::runNoDirectClient(mongo::OperationContext*, std::string const&, mongo::BSONObj const&, mongo::BSONObj const&, mongo::BSONObj const&, bool, bool, bool, mongo::BSONObjBuilder&, std::string&) (find_and_modify.cpp:206)
==8878==    by 0x12BDD6F: mongo::CmdFindAndModify::runNoDirectClient(mongo::OperationContext*, std::string const&, mongo::BSONObj&, int, std::string&, mongo::BSONObjBuilder&, bool) (find_and_modify.cpp:110)
==8878==    by 0x12BFB8D: mongo::CmdFindAndModify::run(mongo::OperationContext*, std::string const&, mongo::BSONObj&, int, std::string&, mongo::BSONObjBuilder&, bool) (find_and_modify.cpp:368)
==8878==    by 0x1337E26: mongo::_execCommand(mongo::OperationContext*, mongo::Command*, std::string const&, mongo::BSONObj&, int, std::string&, mongo::BSONObjBuilder&, bool) (dbcommands.cpp:1160)
==8878==    by 0x1338DDB: mongo::Command::execCommand(mongo::OperationContext*, mongo::Command*, int, char const*, mongo::BSONObj&, mongo::BSONObjBuilder&, bool) (dbcommands.cpp:1374)
==8878==    by 0x13396E8: mongo::_runCommands(mongo::OperationContext*, char const*, mongo::BSONObj&, mongo::_BufBuilder<mongo::TrivialAllocator>&, mongo::BSONObjBuilder&, bool, int) (dbcommands.cpp:1450)
==8878==    by 0x1556AB8: mongo::runCommands(mongo::OperationContext*, char const*, mongo::BSONObj&, mongo::CurOp&, mongo::_BufBuilder<mongo::TrivialAllocator>&, mongo::BSONObjBuilder&, bool, int) (new_find.cpp:131)
==8878==    by 0x1558745: mongo::newRunQuery(mongo::OperationContext*, mongo::Message&, mongo::QueryMessage&, mongo::CurOp&, mongo::Message&, bool) (new_find.cpp:552)
==8878==    by 0x14444FA: mongo::receivedQuery(mongo::OperationContext*, mongo::Client&, mongo::DbResponse&, mongo::Message&, bool) (instance.cpp:220)
==8878==    by 0x144561D: mongo::assembleResponse(mongo::OperationContext*, mongo::Message&, mongo::DbResponse&, mongo::HostAndPort const&, bool) (instance.cpp:393)
==8878==    by 0x114CBCB: mongo::MyMessageHandler::process(mongo::Message&, mongo::AbstractMessagingPort*, mongo::LastError*) (db.cpp:185)
==8878==    by 0x18FB872: mongo::PortMessageServer::handleIncomingMsg(void*) (message_server_port.cpp:234)
==8878==    by 0x4E3F181: start_thread (pthread_create.c:312)
==8878==    by 0x5D7BFBC: clone (clone.S:111)

david.storch / rassi@10gen.com: Is this enough info to diagnose the leak(s)?


Version: 491a9728d381



 Comments   
Comment by J Rassi [ 18/Jun/15 ]

Resolving as a dup of SERVER-19040, then. Thanks again, Kamran.

Comment by Kamran K. [ 18/Jun/15 ]

The WhereMatchExpression::shallowClone leak appears to be the one Andy tracked down in SERVER-19040.

Comment by J Rassi [ 20/Nov/14 ]

Yes, StageBuilder::build() returning false would indeed cause a leak. Though, that verify statement (and others like it) should really be an invariant, since StageBuilder::build() isn't expected to return false (though I wouldn't be all that surprised if there was a bug that can cause it to do so).

Do you think you'd be able to reproduce your original run that repro'd this leak?

Comment by Kamran K. [ 10/Nov/14 ]

rassi@10gen.com, the prepareExecution() function in get_executor.cpp seems to have a few branches where QuerySolution objects will leak if verify(StageBuilder::build()) calls fail.

For example:

328   // See if one of our solutions is a fast count hack in disguise.
329   if (plannerParams.options & QueryPlannerParams::PRIVATE_IS_COUNT) {
330       for (size_t i = 0; i < solutions.size(); ++i) {
331           if (turnIxscanIntoCount(solutions[i])) {
332               // Great, we can use solutions[i].  Clean up the other QuerySolution(s).
333               for (size_t j = 0; j < solutions.size(); ++j) {
334                   if (j != i) {
335                       delete solutions[j];
336                   }
337               }
338   
339               // We're not going to cache anything that's fast count.
340               verify(StageBuilder::build(opCtx, collection, *solutions[i], ws, rootOut));
 
------------>         verify(false) seems like it would cause a leak here
 
341   
342               LOG(2) << "Using fast count: " << canonicalQuery->toStringShort()
343                      << ", planSummary: " << Explain::getPlanSummary(*rootOut);
344   
345               *querySolutionOut = solutions[i];
346               return Status::OK();
347           }
348       }
349   }

I don't think the above branch is responsible for the CollectionScanNode leak, but one of the others may be.

Comment by J Rassi [ 08/Nov/14 ]

No, the leak is not obvious just from the valgrind report. It appears that a CollectionScanNode object is being leaked during execution of a findAndModify command (where the query condition is generating a collection scan). The CollectionScanNode should end up being owned by the query's QuerySolution, which should be owned by the query's PlanExecutor, which should get destroyed at the end of the block in which it's allocated in CmdFindAndModify::runNoDirectClient(). My guess is that the repro is exposing a mishandling of an error condition somewhere in that call stack, but it'd be difficult to dive much further without the actual repro. Do you think you could narrow one down?

Generated at Thu Feb 08 03:39:45 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.