[SERVER-16670] WiredTiger LSM uses too much CPU when it should be idle Created: 26/Dec/14  Updated: 06/Dec/22  Resolved: 02/Aug/21

Status: Closed
Project: Core Server
Component/s: Storage, WiredTiger
Affects Version/s: 2.8.0-rc3
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Mark Callaghan Assignee: Backlog - Storage Engines Team
Resolution: Won't Do Votes: 1
Labels: LSM, wttt
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
Assigned Teams:
Storage Engines
Operating System: ALL
Participants:

 Description   

This might be the same as SERVER-16356, but 16356 doesn't have enough details and has been closed as fixed without details. The problem is that 12 hours after mongod instance because idle it still uses 3 CPU cores (3 cores are each 40% busy). The server isn't doing much IO per iostat:

# iostat -kx 10 | grep fioa
Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await  svctm  %util
fioa              0.00     0.00    0.00   12.30     0.00   231.20    37.59     0.00    0.07   0.00   0.00
fioa              0.00     0.00    0.00    8.10     0.00   128.80    31.80     0.00    0.17   0.00   0.00
fioa              0.00     0.00    0.00   24.00     0.00   474.80    39.57     0.00    0.78   0.00   0.00
fioa              0.00     0.00    0.00    4.40     0.00    86.80    39.45     0.00    0.02   0.00   0.00
fioa              0.00     0.00    0.00    4.00     0.00    57.20    28.60     0.00    0.03   0.00   0.00
fioa              0.00     0.00    0.00    5.10     0.00    60.80    23.84     0.00    0.00   0.00   0.00
fioa              0.00     0.00    0.10   12.50     0.40   107.20    17.08     0.00    0.01   0.00   0.00
fioa              0.00     0.00    0.00    5.20     0.00    62.80    24.15     0.00    0.04   0.00   0.00

The top-N sources of CPU per "perf record ..."

  79.89%   mongod  libc-2.17.so        [.] __strcmp_sse42                                             
    13.54%   mongod  mongod              [.] __wt_conn_dhandle_close_all                                
     0.85%   mongod  [kernel.kallsyms]   [k] update_sd_lb_stats                                         
     0.25%   mongod  [kernel.kallsyms]   [k] idle_cpu                                                   
     0.25%   mongod  [kernel.kallsyms]   [k] find_next_bit                                              
     0.24%   mongod  [kernel.kallsyms]   [k] _raw_spin_lock                                             
     0.23%   mongod  mongod              [.] __wt_session_get_btree                                     
     0.23%   mongod  mongod              [.] __config_getraw.isra.0                                     
     0.21%   mongod  [kernel.kallsyms]   [k] cpumask_next_and                                           
     0.18%   mongod  libpthread-2.17.so  [.] __lll_lock_wait           

A thread stack sample shows the busy threads:

2 threads here -->
__lll_lock_wait,_L_lock_943,pthread_mutex_lock,__wt_spin_lock,__lsm_drop_file,__wt_lsm_free_chunks,__lsm_worker_general_op,__lsm_worker,start_thread,clone
 
1 thread here -->
__strcmp_sse42,__wt_conn_dhandle_close_all,__drop_file,__wt_schema_drop,__lsm_drop_file,__wt_lsm_free_chunks,__lsm_worker_general_op,__lsm_worker,start_thread,clone



 Comments   
Comment by Mark Callaghan [ 09/Jul/15 ]

OK, test ended with a crash

2015-07-08T09:32:36.542-0700 I -        [conn9] Invariant failure false src/mongo/db/storage/key_string.cpp 1067
2015-07-08T09:32:36.552-0700 I CONTROL  [conn9] 
 0x12c219b 0x125f144 0x1244271 0xf3cc80 0xf3e683 0x100c3f4 0xb1cbe5 0xb1e34d 0xb2cfc4 0xb2fbf5 0xb4c367 0xd73d1d 0xd7457c 0xd23096 0xbf2ce2 0x81deba 0x126dbc5 0x7fab9aad47f1 0x7fab9a80b46d
----- BEGIN BACKTRACE -----
{"backtrace":[{"b":"400000","o":"EC219B"},{"b":"400000","o":"E5F144"},{"b":"400000","o":"E44271"},{"b":"400000","o":"B3CC80"},{"b":"400000","o":"B3E683"},{"b":"400000","o":"C0C3F4"},{"b":"400000","o":"71CBE5"},{"b":"400000","o":"71E34D"},{"b":"400000","o":"72CFC4"},{"b":"400000","o":"72FBF5"},{"b":"400000","o":"74C367"},{"b":"400000","o":"973D1D"},{"b":"400000","o":"97457C"},{"b":"400000","o":"923096"},{"b":"400000","o":"7F2CE2"},{"b":"400000","o":"41DEBA"},{"b":"400000","o":"E6DBC5"},{"b":"7FAB9AACD000","o":"77F1"},{"b":"7FAB9A6FE000","o":"10D46D"}],"processInfo":{ "mongodbVersion" : "3.1.5", "gitVersion" : "nogitversion", "compiledModules" : [], "uname" : { "sysname" : "Linux", "release" : "3.10.53-43_fbk14_02627_gb5c286c", "version" : "#1 SMP Mon Oct 20 15:06:57 PDT 2014", "machine" : "x86_64" }, "somap" : [ { "elfType" : 2, "b" : "400000", "buildId" : "3AF90821B892150C489E4F76A76512D227478216" }, { "b" : "7FFFE7FE2000", "path" : "linux-vdso.so.1", "elfType" : 3, "buildId" : "50103D521098AC1A8DA637FE9F84CCFE48B62B25" }, { "b" : "7FAB9B73F000", "path" : "/usr/local/fbcode/gcc-4.9-glibc-2.20/lib/librt.so.1", "elfType" : 3, "buildId" : "5E2D0D29E1625D32A18D8045FAEEA4AFA7D3F933" }, { "b" : "7FAB9B53B000", "path" : "/usr/local/fbcode/gcc-4.9-glibc-2.20/lib/libdl.so.2", "elfType" : 3, "buildId" : "F3BE7159B6D45202A6B62EDC087FC16626BF473B" }, { "b" : "7FAB9B20A000", "path" : "/usr/local/fbcode/gcc-4.9-glibc-2.20/lib/libstdc++.so.6", "elfType" : 3, "buildId" : "F2E276E6187C0080E36967D3453C72585D3C6D5B" }, { "b" : "7FAB9AF02000", "path" : "/usr/local/fbcode/gcc-4.9-glibc-2.20/lib/libm.so.6", "elfType" : 3, "buildId" : "5D6FA14B291E17712D60AE7F79C9BA95930964C3" }, { "b" : "7FAB9ACEB000", "path" : "/usr/local/fbcode/gcc-4.9-glibc-2.20/lib/libgcc_s.so.1", "elfType" : 3, "buildId" : "5A5D36A79367C01CE60366EA0F989F1BE5406AB5" }, { "b" : "7FAB9AACD000", "path" : "/usr/local/fbcode/gcc-4.9-glibc-2.20/lib/libpthread.so.0", "elfType" : 3, "buildId" : "6F994975885C1342997A1DE24EB0C2F70BD40454" }, { "b" : "7FAB9A6FE000", "path" : "/usr/local/fbcode/gcc-4.9-glibc-2.20/lib/libc.so.6", "elfType" : 3, "buildId" : "AF4A0A2EE809CD9DEA631904A233F9FF08452F09" }, { "b" : "7FAB9B948000", "path" : "/usr/local/fbcode/gcc-4.9-glibc-2.20/lib/ld.so", "elfType" : 3, "buildId" : "094797684101C341C19892AF554077EB33581A81" } ] }}
 mongod.315.tcm(_ZN5mongo15printStackTraceERSo+0x2B) [0x12c219b]
 mongod.315.tcm(_ZN5mongo10logContextEPKc+0x134) [0x125f144]
 mongod.315.tcm(_ZN5mongo15invariantFailedEPKcS1_j+0x101) [0x1244271]
 mongod.315.tcm(+0xB3CC80) [0xf3cc80]
 mongod.315.tcm(_ZN5mongo9KeyString6toBsonEPKcmNS_8OrderingERKNS0_8TypeBitsE+0x1A3) [0xf3e683]
 mongod.315.tcm(+0xC0C3F4) [0x100c3f4]
 mongod.315.tcm(_ZN5mongo9IndexScan13initIndexScanEv+0x295) [0xb1cbe5]
 mongod.315.tcm(_ZN5mongo9IndexScan4workEPm+0x1DD) [0xb1e34d]
 mongod.315.tcm(_ZN5mongo7OrStage4workEPm+0x64) [0xb2cfc4]
 mongod.315.tcm(_ZN5mongo15ProjectionStage4workEPm+0x45) [0xb2fbf5]
 mongod.315.tcm(_ZN5mongo12SubplanStage4workEPm+0x47) [0xb4c367]
 mongod.315.tcm(_ZN5mongo12PlanExecutor18getNextSnapshottedEPNS_11SnapshottedINS_7BSONObjEEEPNS_8RecordIdE+0x30D) [0xd73d1d]
 mongod.315.tcm(_ZN5mongo12PlanExecutor7getNextEPNS_7BSONObjEPNS_8RecordIdE+0x3C) [0xd7457c]
 mongod.315.tcm(_ZN5mongo8runQueryEPNS_16OperationContextERNS_12QueryMessageERKNS_15NamespaceStringERNS_7MessageE+0xD76) [0xd23096]
 mongod.315.tcm(_ZN5mongo16assembleResponseEPNS_16OperationContextERNS_7MessageERNS_10DbResponseERKNS_11HostAndPortE+0xAD2) [0xbf2ce2]
 mongod.315.tcm(_ZN5mongo16MyMessageHandler7processERNS_7MessageEPNS_21AbstractMessagingPortE+0xBA) [0x81deba]
 mongod.315.tcm(_ZN5mongo17PortMessageServer17handleIncomingMsgEPv+0x2A5) [0x126dbc5]
 libpthread.so.0(+0x77F1) [0x7fab9aad47f1]
 libc.so.6(clone+0x6D) [0x7fab9a80b46d]
-----  END BACKTRACE  -----
2015-07-08T09:32:36.552-0700 I -        [conn9] 
 
***aborting after invariant() failure

Comment by Alexander Gorrod [ 08/Jul/15 ]

Thanks for trying mdcallag - no need to keep pushing.

Comment by Mark Callaghan [ 08/Jul/15 ]

Trying to insert 500M docs into the WiredTiger LSM. Reached 200M docs and the insert rate is now under 300/second. Not sure I can wait for it to reach 500M. Looks like everything stuck in eviction code:

     10 pthread_cond_timedwait@@GLIBC_2.3.2,__wt_cond_wait,__wt_cache_eviction_worker,__wt_cache_eviction_check,__wt_txn_begin,__session_begin_transaction,mongo::WiredTigerRecoveryUnit::_txnOpen,mongo::WiredTigerRecoveryUnit::getSession,mongo::WiredTigerCursor::WiredTigerCursor,mongo::WiredTigerRecordStore::insertRecord,mongo::Collection::_insertDocument,mongo::Collection::insertDocument,mongo::singleInsert,mongo::insertOne,mongo::WriteBatchExecutor::execOneInsert,mongo::WriteBatchExecutor::execInserts,mongo::WriteBatchExecutor::bulkExecute,mongo::WriteBatchExecutor::executeBatch,mongo::WriteCmd::run,mongo::Command::run,mongo::Command::execCommand,mongo::runCommands,mongo::receivedCommand,mongo::assembleResponse,mongo::MyMessageHandler::process,mongo::PortMessageServer::handleIncomingMsg,start_thread,clone
      3 pthread_cond_timedwait@@GLIBC_2.3.2,__wt_cond_wait,__lsm_worker,start_thread,clone
      3 pthread_cond_timedwait@@GLIBC_2.3.2,__wt_cond_wait,__evict_worker,start_thread,clone
      1 sigwait,mongo::(anonymous,boost::(anonymous,start_thread,clone
      1 select,__wt_sleep,__lsm_manager_run_server,__lsm_worker_manager,start_thread,clone
      1 select,mongo::Listener::initAndListen,mongo::_initAndListen,mongo::initAndListen,mongoDbMain,main
      1 recv,mongo::Socket::_recv,mongo::Socket::unsafe_recv,mongo::Socket::recv,mongo::MessagingPort::recv,mongo::PortMessageServer::handleIncomingMsg,start_thread,clone
      1 pthread_cond_wait@@GLIBC_2.3.2,boost::condition_variable::wait,mongo::DeadlineMonitor<mongo::V8Scope>::deadlineMonitorThread,boost::(anonymous,start_thread,clone
      1 pthread_cond_timedwait@@GLIBC_2.3.2,__wt_cond_wait,__wt_cache_eviction_worker,__wt_cache_eviction_check,__wt_txn_begin,__session_begin_transaction,mongo::WiredTigerRecoveryUnit::_txnOpen,mongo::WiredTigerRecoveryUnit::getSession,mongo::WiredTigerRecordStore::cappedDeleteAsNeeded_inlock,mongo::(anonymous,mongo::(anonymous,mongo::BackgroundJob::jobBody,execute_native_thread_routine,start_thread,clone
      1 pthread_cond_timedwait@@GLIBC_2.3.2,__wt_cond_wait,__wt_cache_eviction_worker,__wt_cache_eviction_check,__wt_txn_begin,__session_begin_transaction,mongo::WiredTigerRecoveryUnit::_txnOpen,mongo::WiredTigerRecoveryUnit::getSession,mongo::WiredTigerCursor::WiredTigerCursor,mongo::(anonymous,mongo::(anonymous,boost::make_unique<mongo::(anonymous,mongo::WiredTigerIndexStandard::newCursor,mongo::IndexAccessMethod::newCursor,mongo::IndexScan::initIndexScan,mongo::IndexScan::work,mongo::OrStage::work,mongo::ProjectionStage::work,mongo::SubplanStage::work,mongo::PlanExecutor::getNextSnapshotted,mongo::PlanExecutor::getNext,mongo::runQuery,mongo::receivedQuery,mongo::assembleResponse,mongo::MyMessageHandler::process,mongo::PortMessageServer::handleIncomingMsg,start_thread,clone
      1 pthread_cond_timedwait@@GLIBC_2.3.2,__wt_cond_wait,__sweep_server,start_thread,clone
      1 pthread_cond_timedwait@@GLIBC_2.3.2,__wt_cond_wait,__log_wrlsn_server,start_thread,clone
      1 pthread_cond_timedwait@@GLIBC_2.3.2,__wt_cond_wait,__log_server,start_thread,clone
      1 pthread_cond_timedwait@@GLIBC_2.3.2,__wt_cond_wait,__log_file_server,start_thread,clone
      1 pthread_cond_timedwait@@GLIBC_2.3.2,__wt_cond_wait,__ckpt_server,start_thread,clone
      1 pthread_cond_timedwait@@GLIBC_2.3.2,__gthread_cond_timedwait,std::condition_variable::__wait_until_impl<std::chrono::duration<long,,std::condition_variable::wait_until<std::chrono::duration<long,,std::condition_variable::wait_for<long,,mongo::(anonymous,mongo::BackgroundJob::jobBody,execute_native_thread_routine,start_thread,clone
      1 pthread_cond_timedwait@@GLIBC_2.3.2,boost::condition_variable::do_wait_until,boost::condition_variable::timed_wait,boost::condition_variable::timed_wait<boost::date_time::subsecond_duration<boost::posix_time::time_duration,,mongo::RangeDeleter::doWork,boost::(anonymous,start_thread,clone
      1 nanosleep,mongo::sleepsecs,mongo::TTLMonitor::run,mongo::BackgroundJob::jobBody,execute_native_thread_routine,start_thread,clone
      1 nanosleep,mongo::sleepsecs,mongo::repl::replMasterThread,boost::(anonymous,start_thread,clone
      1 nanosleep,mongo::sleepsecs,mongo::ClientCursorMonitor::run,mongo::BackgroundJob::jobBody,execute_native_thread_routine,start_thread,clone
      1 __evict_walk,__evict_lru_walk,__evict_pass,__evict_server,start_thread,clone

Comment by Mark Callaghan [ 08/Jul/15 ]

Ran my iibench -> https://github.com/mdcallag/iibench-mongodb with this command line:
java -cp mongo-java-driver-2.13.1.jar:/data/mysql/iibench-mongodb/src jmongoiibench iibench 10 500000000 10 -1 10 mongoiibench-500000000-10-999999-10-1-0.txt.tsv quicklz 16384 60000000000 4 100000 999999 SAFE localhost 27017 1 1000 3 50 Y 1 0 1

And there are just too many stalls. I don't think this is a new problem. When the load ends I will see if this problem has been fixed though.

I am too lazy to see if there is an open bug for the stalls. These are stacks from PMP during a stall:

     10 pthread_cond_timedwait@@GLIBC_2.3.2,__wt_cond_wait,__wt_cache_eviction_worker,__wt_cache_eviction_check,__wt_txn_begin,__session_begin_transaction,mongo::WiredTigerRecoveryUnit::_txnOpen,mongo::WiredTigerRecoveryUnit::getSession,mongo::WiredTigerCursor::WiredTigerCursor,mongo::WiredTigerRecordStore::insertRecord,mongo::Collection::_insertDocument,mongo::Collection::insertDocument,mongo::singleInsert,mongo::insertOne,mongo::WriteBatchExecutor::execOneInsert,mongo::WriteBatchExecutor::execInserts,mongo::WriteBatchExecutor::bulkExecute,mongo::WriteBatchExecutor::executeBatch,mongo::WriteCmd::run,mongo::Command::run,mongo::Command::execCommand,mongo::runCommands,mongo::receivedCommand,mongo::assembleResponse,mongo::MyMessageHandler::process,mongo::PortMessageServer::handleIncomingMsg,start_thread,clone
      3 pthread_cond_timedwait@@GLIBC_2.3.2,__wt_cond_wait,__lsm_worker,start_thread,clone
      3 pthread_cond_timedwait@@GLIBC_2.3.2,__wt_cond_wait,__evict_worker,start_thread,clone
      1 tcmalloc::SLL_Pop,tcmalloc::ThreadCache::FreeList::Pop,tcmalloc::ThreadCache::Allocate,(anonymous,(anonymous,(anonymous,tc_new,std::string::assign(char,std::string::assign,std::string::operator=,mongo::IndexBoundsBuilder::translate,mongo::QueryPlannerAccess::makeLeafNode,mongo::QueryPlannerAccess::processIndexScans,mongo::QueryPlannerAccess::buildIndexedAnd,mongo::QueryPlannerAccess::buildIndexedDataAccess,mongo::QueryPlannerAccess::processIndexScansSubnode,mongo::QueryPlannerAccess::processIndexScans,mongo::QueryPlannerAccess::buildIndexedOr,mongo::QueryPlannerAccess::buildIndexedDataAccess,mongo::SubplanStage::choosePlanForSubqueries,mongo::SubplanStage::pickBestPlan,mongo::PlanExecutor::pickBestPlan,mongo::PlanExecutor::make,mongo::PlanExecutor::make,mongo::getExecutor,mongo::getExecutorFind,mongo::runQuery,mongo::receivedQuery,mongo::assembleResponse,mongo::MyMessageHandler::process,mongo::PortMessageServer::handleIncomingMsg,start_thread,clone
      1 sigwait,mongo::(anonymous,boost::(anonymous,start_thread,clone
      1 select,__wt_sleep,__lsm_manager_run_server,__lsm_worker_manager,start_thread,clone
      1 select,mongo::Listener::initAndListen,mongo::_initAndListen,mongo::initAndListen,mongoDbMain,main
      1 recv,mongo::Socket::_recv,mongo::Socket::unsafe_recv,mongo::Socket::recv,mongo::MessagingPort::recv,mongo::PortMessageServer::handleIncomingMsg,start_thread,clone
      1 pthread_cond_wait@@GLIBC_2.3.2,boost::condition_variable::wait,mongo::DeadlineMonitor<mongo::V8Scope>::deadlineMonitorThread,boost::(anonymous,start_thread,clone
      1 pthread_cond_timedwait@@GLIBC_2.3.2,__wt_cond_wait,__wt_cache_eviction_worker,__wt_cache_eviction_check,__wt_txn_begin,__session_begin_transaction,mongo::WiredTigerRecoveryUnit::_txnOpen,mongo::WiredTigerRecoveryUnit::getSession,mongo::WiredTigerRecordStore::cappedDeleteAsNeeded_inlock,mongo::(anonymous,mongo::(anonymous,mongo::BackgroundJob::jobBody,execute_native_thread_routine,start_thread,clone
      1 pthread_cond_timedwait@@GLIBC_2.3.2,__wt_cond_wait,__sweep_server,start_thread,clone
      1 pthread_cond_timedwait@@GLIBC_2.3.2,__wt_cond_wait,__log_wrlsn_server,start_thread,clone
      1 pthread_cond_timedwait@@GLIBC_2.3.2,__wt_cond_wait,__log_server,start_thread,clone
      1 pthread_cond_timedwait@@GLIBC_2.3.2,__wt_cond_wait,__log_file_server,start_thread,clone
      1 pthread_cond_timedwait@@GLIBC_2.3.2,__wt_cond_wait,__ckpt_server,start_thread,clone
      1 pthread_cond_timedwait@@GLIBC_2.3.2,__gthread_cond_timedwait,std::condition_variable::__wait_until_impl<std::chrono::duration<long,,std::condition_variable::wait_until<std::chrono::duration<long,,std::condition_variable::wait_for<long,,mongo::(anonymous,mongo::BackgroundJob::jobBody,execute_native_thread_routine,start_thread,clone
      1 pthread_cond_timedwait@@GLIBC_2.3.2,boost::condition_variable::do_wait_until,boost::condition_variable::timed_wait,boost::condition_variable::timed_wait<boost::date_time::subsecond_duration<boost::posix_time::time_duration,,mongo::RangeDeleter::doWork,boost::(anonymous,start_thread,clone
      1 nanosleep,mongo::sleepsecs,mongo::TTLMonitor::run,mongo::BackgroundJob::jobBody,execute_native_thread_routine,start_thread,clone
      1 nanosleep,mongo::sleepsecs,mongo::repl::replMasterThread,boost::(anonymous,start_thread,clone
      1 nanosleep,mongo::sleepsecs,mongo::ClientCursorMonitor::run,mongo::BackgroundJob::jobBody,execute_native_thread_routine,start_thread,clone
      1 __evict_walk,__evict_lru_walk,__evict_pass,__evict_server,start_thread,clone

And this shows the variance in the insert rate. Look at the "int ips=" column.

Thread[Thread-2,5,main]129,798,460 inserts : 5,870 seconds : cum ips=22,111.85 : int ips=12,377.10 : cum avg qry=1.67 : int avg qry=9.69 : cum avg qps=598.78 : int avg qps=107.21 : exceptions=0
Thread[Thread-2,5,main]129,900,260 inserts : 5,880 seconds : cum ips=22,091.52 : int ips=10,169.83 : cum avg qry=1.67 : int avg qry=3.38 : cum avg qps=598.27 : int avg qps=295.40 : exceptions=0
Thread[Thread-2,5,main]129,992,950 inserts : 5,890 seconds : cum ips=22,070.09 : int ips=9,354.12 : cum avg qry=1.67 : int avg qry=4.16 : cum avg qps=597.64 : int avg qps=228.08 : exceptions=0
Thread[Thread-2,5,main]129,992,950 inserts : 5,900 seconds : cum ips=22,032.65 : int ips=0.00 : cum avg qry=1.67 : int avg qry=0.00 : cum avg qps=596.63 : int avg qps=0.00 : exceptions=0
Thread[Thread-2,5,main]130,040,980 inserts : 5,910 seconds : cum ips=22,003.45 : int ips=4,797.24 : cum avg qry=1.68 : int avg qry=18.42 : cum avg qps=595.81 : int avg qps=111.27 : exceptions=0
Thread[Thread-2,5,main]130,266,110 inserts : 5,920 seconds : cum ips=22,004.28 : int ips=22,495.00 : cum avg qry=1.68 : int avg qry=2.20 : cum avg qps=595.57 : int avg qps=453.24 : exceptions=0
Thread[Thread-2,5,main]130,401,600 inserts : 5,930 seconds : cum ips=21,989.99 : int ips=13,535.46 : cum avg qry=1.68 : int avg qry=4.70 : cum avg qps=594.92 : int avg qps=213.29 : exceptions=0
Thread[Thread-2,5,main]130,509,300 inserts : 5,940 seconds : cum ips=21,971.07 : int ips=10,760.32 : cum avg qry=1.68 : int avg qry=6.43 : cum avg qps=594.18 : int avg qps=155.36 : exceptions=0
Thread[Thread-2,5,main]130,605,290 inserts : 5,950 seconds : cum ips=21,950.23 : int ips=9,588.45 : cum avg qry=1.68 : int avg qry=5.86 : cum avg qps=593.39 : int avg qps=126.96 : exceptions=0
Thread[Thread-2,5,main]130,605,290 inserts : 5,960 seconds : cum ips=21,913.37 : int ips=0.00 : cum avg qry=1.68 : int avg qry=0.00 : cum avg qps=592.40 : int avg qps=0.00 : exceptions=0
Thread[Thread-2,5,main]130,679,720 inserts : 5,970 seconds : cum ips=21,889.10 : int ips=7,435.56 : cum avg qry=1.69 : int avg qry=15.14 : cum avg qps=591.65 : int avg qps=148.95 : exceptions=0
Thread[Thread-2,5,main]130,801,030 inserts : 5,980 seconds : cum ips=21,872.75 : int ips=12,120.09 : cum avg qry=1.69 : int avg qry=4.96 : cum avg qps=591.00 : int avg qps=201.82 : exceptions=0
Thread[Thread-2,5,main]130,920,890 inserts : 5,990 seconds : cum ips=21,856.57 : int ips=12,094.85 : cum avg qry=1.69 : int avg qry=4.94 : cum avg qps=590.32 : int avg qps=177.50 : exceptions=0
Thread[Thread-2,5,main]131,039,610 inserts : 6,000 seconds : cum ips=21,839.89 : int ips=11,861.32 : cum avg qry=1.69 : int avg qry=6.30 : cum avg qps=589.63 : int avg qps=177.74 : exceptions=0
Thread[Thread-2,5,main]131,136,370 inserts : 6,010 seconds : cum ips=21,819.62 : int ips=9,666.33 : cum avg qry=1.70 : int avg qry=4.56 : cum avg qps=589.01 : int avg qps=219.28 : exceptions=0
Thread[Thread-2,5,main]131,220,810 inserts : 6,020 seconds : cum ips=21,797.42 : int ips=8,448.22 : cum avg qry=1.70 : int avg qry=5.69 : cum avg qps=588.32 : int avg qps=175.69 : exceptions=0
Thread[Thread-2,5,main]131,294,500 inserts : 6,030 seconds : cum ips=21,773.45 : int ips=7,360.90 : cum avg qry=1.70 : int avg qry=4.42 : cum avg qps=587.72 : int avg qps=221.96 : exceptions=0
Thread[Thread-2,5,main]131,441,100 inserts : 6,040 seconds : cum ips=21,761.64 : int ips=14,646.82 : cum avg qry=1.70 : int avg qry=5.88 : cum avg qps=587.02 : int avg qps=169.95 : exceptions=0
Thread[Thread-2,5,main]131,620,770 inserts : 6,050 seconds : cum ips=21,755.33 : int ips=17,949.05 : cum avg qry=1.70 : int avg qry=3.85 : cum avg qps=586.49 : int avg qps=264.44 : exceptions=0
Thread[Thread-2,5,main]131,794,320 inserts : 6,060 seconds : cum ips=21,748.04 : int ips=17,339.39 : cum avg qry=1.70 : int avg qry=3.91 : cum avg qps=585.94 : int avg qps=255.67 : exceptions=0

Comment by Alexander Gorrod [ 07/Jul/15 ]

Yes - 3.1.5 has the necessary improvements.

Comment by Mark Callaghan [ 07/Jul/15 ]

I have a build for 3.1.5. Will that have the improvements?

Comment by Alexander Gorrod [ 07/Jul/15 ]

mdcallag Sorry for the long delay getting back to this.

The issue was that your LSM tree ends up with some data that had been merged (and is thus obsolete), but for some reason we still had a reference to the un-merged version of the data. When that happens LSM will keep trying to drop the obsolete chunk(s) - which corresponds to the call stacks you are reporting. We have made major changes in how we drop chunks from the LSM tree, and also how we drop files generally in WiredTiger since the MongoDB 3.0 release.

If you are interested in the details, a good starting point for the drop changes is this ticket:
https://jira.mongodb.org/browse/WT-1876

I attempted to reproduce the symptoms you described with a build of MongoDB master (3.2), and could not. Are you still running this workload? If so could you let me know if you still see the problem?

Comment by Mark Callaghan [ 26/Dec/14 ]

My test server used this mongo.conf:

processManagement:
  fork: true
systemLog:
  destination: file
  path: /data/mysql/mongo.28/log
  logAppend: true
storage:
  dbPath: /data/mysql/mongo.28/data
  journal:
    enabled: true
  mmapv1:
    syncPeriodSecs: 60
    journal:
      commitIntervalMs: 100
storage.wiredTiger.collectionConfig.blockCompressor: snappy
storage.wiredTiger.engineConfig.journalCompressor: none

And this command line for 2.8.0rc3
bin.tcm/mongod --config /data/mysql/mongo.28/mongo.conf --storageEngine wiredTiger --wiredTigerIndexConfigString=type=lsm

The test was https://github.com/tmcallaghan/iibench-mongodb

Comment by Mark Callaghan [ 26/Dec/14 ]

Another stack trace with line numbers showing the source of CPU

Thread 2 (Thread 0x7fa3c898d700 (LWP 2966206)):
#0  0x00007fa3d57bf6b3 in __strcmp_sse42 () from /usr/local/fbcode/gcc-4.8.1-glibc-2.17/lib/libc.so.6
#1  0x00000000012c7799 in __wt_conn_dhandle_close_all (session=session@entry=0x2bf7900, name=name@entry=0x6d901b90 "file:index-4--462074404385471553-000430.bf", force=force@entry=0) at src/third_party/wiredtiger/src/conn/conn_dhandle.c:612
#2  0x000000000131ca0f in __drop_file (cfg=0x7fa3c898c8d0, force=<optimized out>, uri=0x6d901b90 "file:index-4--462074404385471553-000430.bf", session=0x2bf7900) at src/third_party/wiredtiger/src/schema/schema_drop.c:31
#3  __wt_schema_drop (session=session@entry=0x2bf7900, uri=<optimized out>, uri@entry=0x6d901b90 "file:index-4--462074404385471553-000430.bf", cfg=cfg@entry=0x7fa3c898c970) at src/third_party/wiredtiger/src/schema/schema_drop.c:174
#4  0x00000000012f96ff in __lsm_drop_file (session=session@entry=0x2bf7900, uri=0x6d901b90 "file:index-4--462074404385471553-000430.bf") at src/third_party/wiredtiger/src/lsm/lsm_work_unit.c:494
#5  0x00000000012fa33c in __wt_lsm_free_chunks (session=session@entry=0x2bf7900, lsm_tree=0x3fc2e00) at src/third_party/wiredtiger/src/lsm/lsm_work_unit.c:573
#6  0x00000000012fa846 in __lsm_worker_general_op (cookie=0x2b04188, completed=<synthetic pointer>, session=0x2bf7900) at src/third_party/wiredtiger/src/lsm/lsm_worker.c:71
#7  __lsm_worker (arg=0x2b04188) at src/third_party/wiredtiger/src/lsm/lsm_worker.c:121
#8  0x00007fa3d669efa8 in start_thread () from /usr/local/fbcode/gcc-4.8.1-glibc-2.17/lib/libpthread.so.0
#9  0x00007fa3d57805ad in clone () from /usr/local/fbcode/gcc-4.8.1-glibc-2.17/lib/libc.so.6

And the source where this occurs, see the strcmp at the end

*      Close all data handles handles with matching name (including all
 *      checkpoint handles).
 */
int
__wt_conn_dhandle_close_all(
    WT_SESSION_IMPL *session, const char *name, int force)
{
        WT_CONNECTION_IMPL *conn;
        WT_DATA_HANDLE *dhandle;
        WT_DECL_RET;
        uint64_t bucket;
 
        conn = S2C(session);
 
        WT_ASSERT(session, F_ISSET(session, WT_SESSION_HANDLE_LIST_LOCKED));
        WT_ASSERT(session, session->dhandle == NULL);
 
        bucket = __wt_hash_city64(name, strlen(name)) % WT_HASH_ARRAY_SIZE;
        SLIST_FOREACH(dhandle, &conn->dhhash[bucket], l) {
                if (strcmp(dhandle->name, name) != 0)
                        continue;

Generated at Thu Feb 08 03:41:52 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.