[SERVER-20763] Performance regression in find command (versus legacy OP_QUERY) due to additional contended calls into CursorManager Created: 05/Oct/15  Updated: 15/Oct/15  Resolved: 09/Oct/15

Status: Closed
Project: Core Server
Component/s: Querying
Affects Version/s: None
Fix Version/s: 3.2.0-rc0

Type: Bug Priority: Major - P3
Reporter: Martin Bligh Assignee: David Storch
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
is related to SERVER-20596 Performance regression in new mongos ... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Participants:

 Description   

Run this script:

#!/bin/bash
 
THREADS=$1
CMD=$2
if [ "$3" == "" ]; then
    BENCH_SECONDS=10
else
    BENCH_SECONDS="$3"
fi
 
mongo --quiet -eval 'db.foo.insert({_id:1})'
 
if [ $CMD == "false" ]; then
    mongo --quiet -eval '
    ops = [{op:"find", ns:"test.foo", query: {_id: 1}}]
    results = benchRun({ops:ops, parallel:'$THREADS', seconds:'$BENCH_SECONDS'})
    print(Math.round(results["totalOps/s"]))
    ' 
else
    mongo --quiet -eval '
    ops = [{op:"command", command: {"find": "foo", "filter": {"_id" : 1}}, ns:"test"}]
    results = benchRun({ops:ops, parallel:'$THREADS', seconds:'$BENCH_SECONDS'})
    print(Math.round(results["totalOps/s"]))
    ' 
fi

You get much more lock contention with the command path vs not:

benchRun.find 32 false 30
233511 ops/s

-   6.32%  mongod  [kernel.kallsyms]    [k] _raw_spin_lock                                       
   - _raw_spin_lock                                                                              
      - 48.54% futex_wake                                                                        
           do_futex                                                                              
           sys_futex                                                                             
           system_call_fastpath                                                                  
         - __lll_unlock_wake                                                                     
            + 41.15% mongo::PlanExecutor::registerExec()                                         
            + 35.88% mongo::PlanExecutor::deregisterExec()                                       
            + 5.40% mongo::runQuery(mongo::OperationContext*, mongo::QueryMessage&, mongo::Namesp
            + 5.26% mongo::AutoGetCollection::AutoGetCollection(mongo::OperationContext*, mongo::
            + 4.98% mongo::AutoGetCollectionForRead::~AutoGetCollectionForRead()                 
            + 2.61% mongo::getExecutorFind(mongo::OperationContext*, mongo::Collection*, mongo::N
            + 1.92% mongo::WiredTigerCursor::WiredTigerCursor(std::string const&, unsigned long, 
            + 1.72% mongo::WiredTigerRecoveryUnit::~WiredTigerRecoveryUnit()                     
            + 1.09% mongo::fillOutPlannerParams(mongo::OperationContext*, mongo::Collection*, mon
      + 30.15% futex_wait_setup                                                                  
      + 18.51% try_to_wake_up      

vs:
benchRun.find 32 true 30
119839 ops/s

 -  43.11%  mongod  [kernel.kallsyms]    [k] _raw_spin_lock
  - _raw_spin_lock                                                                              
      - 51.98% futex_wake                                                                        
           do_futex                                                                              
           sys_futex                                                                            
           system_call_fastpath                                                                  
         - __lll_unlock_wake                                                                    
            + 23.98% mongo::ClientCursor::ClientCursor(mongo::CursorManager*, mongo::PlanExecutor
            + 20.36% mongo::ClientCursorPin::deleteUnderlying()                                  
            + 19.67% mongo::PlanExecutor::registerExec()                                        
            + 18.84% mongo::PlanExecutor::deregisterExec()                                      
            + 16.71% mongo::ClientCursorPin::ClientCursorPin(mongo::CursorManager*, long long)  
      - 45.15% futex_wait_setup                                                                  
           futex_wait                                                                            
           do_futex                                                                              
           sys_futex                                                                            
           system_call_fastpath                                                                  
         - __lll_lock_wait                                                                      
            + 22.64% mongo::ClientCursor::ClientCursor(mongo::CursorManager*, mongo::PlanExecutor
            + 22.05% mongo::PlanExecutor::deregisterExec()                                      
            + 19.39% mongo::PlanExecutor::registerExec()                                        
            + 18.94% mongo::ClientCursorPin::deleteUnderlying()                                  
            + 16.70% mongo::ClientCursorPin::ClientCursorPin(mongo::CursorManager*, long long)  
      + 2.56% try_to_wake_up  

Contention appears to be CursorManager::_mutex.



 Comments   
Comment by David Storch [ 09/Oct/15 ]

The change for this ticket resolves contention on the CursorManager mutex for a query workload that does not require any getMores. Further performance work for the new find and getMore commands path will be done under SERVER-20596.

Comment by Githook User [ 09/Oct/15 ]

Author:

{u'username': u'dstorch', u'name': u'David Storch', u'email': u'david.storch@10gen.com'}

Message: SERVER-20763 delay creation of ClientCursor in find command

Find command point queries now acquire the CursorManager mutex just
twice, reducing contention.
Branch: master
https://github.com/mongodb/mongo/commit/162bc3d5427c6a2c6475ec400a14598168cd0ff8

Comment by Martin Bligh [ 07/Oct/15 ]

Patch fixes about half the perf delta. Profile is now

-  14.93%  mongod  [kernel.kallsyms]    [k] _raw_spin_lock                                                                                         
   - _raw_spin_lock                                                                                                                                
      - 51.04% futex_wake                                                                                                                          
           do_futex                                                                                                                                
           sys_futex                                                                                                                               
           system_call_fastpath                                                                                                                    
         - __lll_unlock_wake                                                                                                                       
            - 35.73% mongo::ClientCursor::ClientCursor(mongo::CursorManager*, mongo::PlanExecutor*, std::string const&, bool, int, mongo::BSONObj, 
               - mongo::FindCmd::run(mongo::OperationContext*, std::string const&, mongo::BSONObj&, int, std::string&, mongo::BSONObjBuilder&)     
                  + 100.00% mongo::Command::run(mongo::OperationContext*, mongo::rpc::RequestInterface const&, mongo::rpc::ReplyBuilderInterface*) 
            - 32.66% mongo::ClientCursorPin::deleteUnderlying()                                                                                    
                 mongo::FindCmd::run(mongo::OperationContext*, std::string const&, mongo::BSONObj&, int, std::string&, mongo::BSONObjBuilder&)     
                 mongo::Command::run(mongo::OperationContext*, mongo::rpc::RequestInterface const&, mongo::rpc::ReplyBuilderInterface*)            
                 mongo::Command::execCommand(mongo::OperationContext*, mongo::Command*, mongo::rpc::RequestInterface const&, mongo::rpc::ReplyBuild
                 mongo::runCommands(mongo::OperationContext*, mongo::rpc::RequestInterface const&, mongo::rpc::ReplyBuilderInterface*)             
                 mongo::assembleResponse(mongo::OperationContext*, mongo::Message&, mongo::DbResponse&, mongo::HostAndPort const&)                 
                 mongo::MyMessageHandler::process(mongo::Message&, mongo::AbstractMessagingPort*)                                                  
                 mongo::PortMessageServer::handleIncomingMsg(void*)                                                                                
                 start_thread                                                                                                                      
            - 27.43% mongo::ClientCursorPin::ClientCursorPin(mongo::CursorManager*, long long)                                                     
                 mongo::FindCmd::run(mongo::OperationContext*, std::string const&, mongo::BSONObj&, int, std::string&, mongo::BSONObjBuilder&)     
                 mongo::Command::run(mongo::OperationContext*, mongo::rpc::RequestInterface const&, mongo::rpc::ReplyBuilderInterface*)            
                 mongo::Command::execCommand(mongo::OperationContext*, mongo::Command*, mongo::rpc::RequestInterface const&, mongo::rpc::ReplyBuild
                 mongo::runCommands(mongo::OperationContext*, mongo::rpc::RequestInterface const&, mongo::rpc::ReplyBuilderInterface*)             
                 mongo::assembleResponse(mongo::OperationContext*, mongo::Message&, mongo::DbResponse&, mongo::HostAndPort const&)                 
                 mongo::MyMessageHandler::process(mongo::Message&, mongo::AbstractMessagingPort*)                                                  
                 mongo::PortMessageServer::handleIncomingMsg(void*)                                                                                
                 start_thread     

Comment by David Storch [ 07/Oct/15 ]

I hacked the find command path to not require the extra acquisitions of the CursorManager mutex. (I gave the find command's PlanExecutor a manual yield policy so that it gets registered with the CursorManager just once, when we pass ownership of the PlanExecutor to the ClientCursor.) On my machine, this showed about a 1% improvement in throughput for the find-by-_id benchRun workload. This leads me to believe that contention on CursorManager::_mutex is not the main problem.

CC martin.bligh

UPDATE: Martin can still reproduce this on his more powerful machine, and proved that my PlanExecutor manual yield policy hack significantly alleviates the problem. Plan is now to move forward with finding a fix.

Generated at Thu Feb 08 03:55:12 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.