Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-18281

lock contention in mongo::DBConnectionPool::get

    • Type: Icon: Bug Bug
    • Resolution: Done
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 3.0.1
    • Component/s: Internal Client, Sharding
    • Labels:
    • Fully Compatible
    • ALL
    • Sharding A (10/09/15)

      Test

      • 3.0.1
      • profiling with linux perf on mongos
      • 7 shard of 3 member replica-set
      • load 20M doc, then run mixed traffic

      Observation

      During multiple shard testing, saw mongos mongo::DBConnectionPool::get usage is high in both CPU usage (mainly due to spin_lock) and sleep time (due to futex in Off-CPU capture)

      The following numbers are for 100% read, saw similar behavior with insert and mixed traffic

      Type Total Event/Trace Trace w/ mongo::DBConnectionPool::get %
      on CPU 262959 81145 30.9%
      off CPU 2373010 165658 7%

      Note:

      • On CPU measures CPU usage by sampling on CPU stack. in this case, it is mainly caused by spin_lock
      • Off CPU is to show sleep due to f/mutex or IO. See https://perf.wiki.kernel.org/index.php/Tutorial#Profiling_sleep_times for more details.
      • Trace w/ mongo::DBConnectionPool::get is to count how many time this function show up anywhere in the stack track.
      • Majority of the sleep time of mongos is due to wait for network request/response, which shall be normal. Other sleep time,

      On CPU usage with spin_lock

      -  61.44%  mongos  [kernel.kallsyms]    [k] _raw_spin_lock
         - _raw_spin_lock
            - 59.40% futex_wake
                 do_futex
                 sys_futex
                 system_call_fastpath
               - __lll_unlock_wake
                  - 42.15% mongo::DBConnectionPool::get(std::string const&, double)
                       mongo::ShardConnection::_init()
                       mongo::ShardConnection::ShardConnection(mongo::Shard const&, std::string const&, boost::shared_ptr<mongo::ChunkManager const>)
                       mongo::ParallelSortClusteredCursor::setupVersionAndHandleSlaveOk(boost::shared_ptr<mongo::ParallelConnectionState>, mongo::Shard const&, boost::shared_ptr<mongo::Shard>, mongo::NamespaceString const&, std::string const&,
                     + mongo::ParallelSortClusteredCursor::startInit()
                  + 32.80% mongo::ShardConnection::releaseMyConnections()
                  + 22.66% mongo::ClientConnections::done(std::string const&, mongo::DBClientBase*)
                  + 0.71% mongo::DBClientReplicaSet::_getMonitor() const
                  + 0.58% mongo::Shard::reset(std::string const&)
                  + 0.57% mongo::StaticShardInfo::find(std::string const&)
            - 40.19% futex_wait_setup
                 futex_wait
                 do_futex
                 sys_futex
                 system_call_fastpath
               - __lll_lock_wait
                  - 41.75% mongo::DBConnectionPool::get(std::string const&, double)
                       mongo::ShardConnection::_init()
                       mongo::ShardConnection::ShardConnection(mongo::Shard const&, std::string const&, boost::shared_ptr<mongo::ChunkManager const>)
                       mongo::ParallelSortClusteredCursor::setupVersionAndHandleSlaveOk(boost::shared_ptr<mongo::ParallelConnectionState>, mongo::Shard const&, boost::shared_ptr<mongo::Shard>, mongo::NamespaceString const&, std::string const&,
                       mongo::ParallelSortClusteredCursor::startInit()
                       mongo::ParallelSortClusteredCursor::fullInit()
                       mongo::Strategy::queryOp(mongo::Request&)
                       mongo::Request::process(int)
                       mongo::ShardedMessageHandler::process(mongo::Message&, mongo::AbstractMessagingPort*, mongo::LastError*)
                       mongo::PortMessageServer::handleIncomingMsg(void*)
                       start_thread
                  + 39.48% mongo::ShardConnection::releaseMyConnections()
                  + 15.74% mongo::ClientConnections::done(std::string const&, mongo::DBClientBase*)
                  + 1.14% mongo::ReplicaSetMonitor::get(std::string const&, bool)
                  + 0.92% mongo::StaticShardInfo::find(std::string const&)
                  + 0.71% mongo::Shard::reset(std::string const&)
      -   8.23%  mongos  [kernel.kallsyms]    [k] _raw_spin_unlock_irqrestore
         - _raw_spin_unlock_irqrestore
            - 67.11% try_to_wake_up
                 wake_up_state
                 wake_futex
                 futex_wake
                 do_futex
                 sys_futex
                 system_call_fastpath
               - __lll_unlock_wake
                  - 36.71% mongo::DBConnectionPool::get(std::string const&, double)
                       mongo::ShardConnection::_init()
                       mongo::ShardConnection::ShardConnection(mongo::Shard const&, std::string const&, boost::shared_ptr<mongo::ChunkManager const>)
                       mongo::ParallelSortClusteredCursor::setupVersionAndHandleSlaveOk(boost::shared_ptr<mongo::ParallelConnectionState>, mongo::Shard const&, boost::shared_ptr<mongo::Shard>, mongo::NamespaceString const&, std::string const&,
                     + mongo::ParallelSortClusteredCursor::startInit()
                  + 17.55% mongo::ShardConnection::releaseMyConnections()
                  + 9.34% mongo::ClientConnections::done(std::string const&, mongo::DBClientBase*)
                  + 8.95% mongo::Shard::reset(std::string const&)
                  + 8.58% mongo::StaticShardInfo::find(std::string const&)
                  + 7.52% mongo::DBClientReplicaSet::getServerAddress() const
                  + 6.38% mongo::DBClientReplicaSet::_getMonitor() const
                  + 2.60% mongo::checkShardVersion(mongo::DBClientBase*, std::string const&, boost::shared_ptr<mongo::ChunkManager const>, bool, int)
                  + 1.82% mongo::DBConfig::getChunkManagerOrPrimary(std::string const&, boost::shared_ptr<mongo::ChunkManager const>&, boost::shared_ptr<mongo::Shard>&)
            + 28.80% __wake_up_sync_key
            + 3.06% mod_timer
            + 1.02% prepare_to_wait
      

      off-cpu sleep due to lock

      -  53.90%          mongos  [kernel.kallsyms]  [k] __schedule
         - __schedule
         - schedule
            + 85.45% schedule_timeout
            - 12.46% futex_wait_queue_me
                 futex_wait
                 do_futex
                 sys_futex
               - system_call_fastpath
                  - 93.04% __lll_lock_wait
                     + 15.94% mongo::ClientConnections::done(std::string const&, mongo::DBClientBase*)
                     + 15.15% mongo::checkShardVersion(mongo::DBClientBase*, std::string const&, boost::shared_ptr<mongo::ChunkManager const>, bool, int)
                     + 14.79% mongo::ReplicaSetMonitor::get(std::string const&, bool)
                     - 13.30% mongo::DBConnectionPool::get(std::string const&, double)
                        - 99.95% mongo::ShardConnection::_init()
                             mongo::ShardConnection::ShardConnection(mongo::Shard const&, std::string const&, boost::shared_ptr<mongo::ChunkManager const>)
                             mongo::ParallelSortClusteredCursor::setupVersionAndHandleSlaveOk(boost::shared_ptr<mongo::ParallelConnectionState>, mongo::Shard const&, boost::shared_ptr<mongo::Shard>, mongo::NamespaceString const&, std::string c
                             mongo::ParallelSortClusteredCursor::startInit()
                             mongo::ParallelSortClusteredCursor::fullInit()
                             mongo::Strategy::queryOp(mongo::Request&)
                             mongo::Request::process(int)
                             mongo::ShardedMessageHandler::process(mongo::Message&, mongo::AbstractMessagingPort*, mongo::LastError*)
                             mongo::PortMessageServer::handleIncomingMsg(void*)
                             start_thread
                     + 12.95% mongo::StaticShardInfo::find(std::string const&)
                     + 7.44% mongo::ReplicaSetMonitor::getHostOrRefresh(mongo::ReadPreferenceSetting const&)
                     + 6.64% mongo::ShardConnection::releaseMyConnections()
                     + 6.09% mongo::ParallelSortClusteredCursor::startInit()
                     + 3.31% mongo::Request::reset()
                     + 1.95% mongo::ReplicaSetMonitor::getServerAddress() const
                     + 1.80% mongo::Shard::reset(std::string const&)
                     + 0.52% _nss_files_gethostbyname2_r
                  + 3.88% pthread_cond_timedwait@@GLIBC_2.3.2
                  + 3.03% __lll_lock_wait_private
            + 1.49% schedule_hrtimeout_range_clock
            + 0.60% do_nanosleep
      

            Assignee:
            misha.tyulenev@mongodb.com Misha Tyulenev (Inactive)
            Reporter:
            rui.zhang Rui Zhang (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            10 Start watching this issue

              Created:
              Updated:
              Resolved: