[SERVER-16562] Extremely slow shutdown for WiredTiger (non-LSM) Created: 16/Dec/14 Updated: 24/Dec/14 Resolved: 18/Dec/14 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Concurrency, Storage |
| Affects Version/s: | 2.8.0-rc2 |
| Fix Version/s: | 2.8.0-rc4 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Mark Callaghan | Assignee: | Mark Benvenuto |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Backwards Compatibility: | Fully Compatible |
| Operating System: | ALL |
| Steps To Reproduce: | command line for starting mongod: configuration file: Test case is the insert benchmark changed to insert 2B rows (200M rows per thread, 10 insert threads) with a 2000/byte char field. |
| Participants: |
| Description |
|
I loaded 2B docs via the insert benchmark to create a 300G database last night. The load stopped more than 8 hours ago. This morning I did "kill $( pidof mongod )" to shutdown mongod and the shutdown took 12 minutes. This reproduces with the default (non-LSM) engine for WiredTiger. This does not reproduce with the LSM engine. This is the thread stack for the thread doing all of the work at shutdown. My host has very fast PCIe-flash for storage. Looks like a long checkpoint to me: During the shutdown this error message was printed a few times: 2014-12-16T07:59:02.232-0800 I - [TTLMonitor] LockerId 411893 has been waiting to acquire lock for more than 30 seconds. MongoDB will print the lock manager state and the stack of the thread that GRANTED: PENDING: |
| Comments |
| Comment by Githook User [ 18/Dec/14 ] |
|
Author: {u'username': u'markbenvenuto', u'name': u'Mark Benvenuto', u'email': u'mark.benvenuto@mongodb.com'}Message: |
| Comment by Asya Kamsky [ 17/Dec/14 ] |
|
I've seen slow shutdown as well - I'd assumed it was the last checkpoint and in fact if I sleep for 60 to 120 seconds before shutting down it takes less time. Could be both - last checkpoint and freeing memory? |
| Comment by Mark Callaghan [ 17/Dec/14 ] |
|
I wonder if tuning tcmalloc can help here. Back in the day when I used it with MySQL we reduced the amount of memory it would cache per thread to avoid wasting memory for workloads with 1000+ threads. I repeated the shutdown test after for jemalloc. I didn't not try to tune it. The shutdown took ~17 minutes. And from one sample thread stack there is some time in jemalloc code: And the top N functions from linux perf during a 10 second sample 34.07% mongod mongod [.] __wt_page_out |
| Comment by Michael Cahill [ 16/Dec/14 ] |
|
Hi mdcallag, what you're seeing is WiredTiger freeing memory that was allocated to cache data. There shouldn't be any I/O involved because periodic checkpoints will ensure that the cache is clean. MongoDB defaults to allowing the WiredTiger cache to grow to half the size of system RAM, so there could be 100GB of memory to free, depending on your test system. You may recall that you suggested we add a way for WiredTiger to shut down without freeing that memory, so there is a leak_memory configuration to WT_CONNECTION::close that would avoid all of this work. Of course, the downside of using that is that leak checking tools will report leaked memory. It is not currently used or exposed via MongoDB: we'll have a discussion about whether it can be turned on. |
| Comment by Mark Callaghan [ 16/Dec/14 ] |
|
Tried hierarchical "perf record -g" but (nil) was used rather than names for calling functions so the result is garbage |
| Comment by Mark Callaghan [ 16/Dec/14 ] |
|
Top-N from "perf record" with flat profiling 39.87% mongod mongod [.] __wt_page_out |
| Comment by Mark Callaghan [ 16/Dec/14 ] |
|
Repeated test using fsync on commit and then grabbed thread stacks on shutdown. Here is a sample... SLL_PopRange,PopRange,tcmalloc::ThreadCache::ReleaseToCentralCache,tcmalloc::ThreadCache::ListTooLong,Deallocate,do_free_helper,do_free_with_callback,do_free,tc_free,_wt_free_int,free_skip_list,free_skip_array,free_page_row_leaf,wt_page_out,wt_ref_out,wt_rec_page_clean_update,evict_file,wt_cache_op,wt_checkpoint_close,wt_conn_btree_sync_and_close,wt_conn_dhandle_discard_single,wt_conn_dhandle_discard,wt_connection_close,_conn_close,mongo::WiredTigerKVEngine::cleanShutdown,shutdownServer,mongo::exitCleanly,mongo::(anonymous,boost::(anonymous,start_thread,clone _free_update_list,free_skip_list,free_skip_array,free_page_row_leaf,wt_page_out,wt_ref_out,wt_rec_page_clean_update,evict_file,wt_cache_op,wt_checkpoint_close,wt_conn_btree_sync_and_close,wt_conn_dhandle_discard_single,wt_conn_dhandle_discard,wt_connection_close,_conn_close,mongo::WiredTigerKVEngine::cleanShutdown,shutdownServer,mongo::exitCleanly,mongo::(anonymous,boost::(anonymous,start_thread,clone tcmalloc::CentralFreeList::ReleaseToSpans,tcmalloc::CentralFreeList::ReleaseListToSpans,tcmalloc::CentralFreeList::InsertRange,tcmalloc::ThreadCache::ReleaseToCentralCache,tcmalloc::ThreadCache::ListTooLong,Deallocate,do_free_helper,do_free_with_callback,do_free,tc_free,_wt_free_int,free_update_list,free_skip_list,free_skip_array,free_page_row_leaf,wt_page_out,wt_ref_out,wt_rec_page_clean_update,evict_file,wt_cache_op,wt_checkpoint_close,wt_conn_btree_sync_and_close,wt_conn_dhandle_discard_single,wt_conn_dhandle_discard,wt_connection_close,_conn_close,mongo::WiredTigerKVEngine::cleanShutdown,shutdownServer,mongo::exitCleanly,mongo::(anonymous,boost::(anonymous,start_thread,clone _free_update_list,free_skip_list,free_skip_array,free_page_row_leaf,wt_page_out,wt_ref_out,wt_rec_page_clean_update,evict_file,wt_cache_op,wt_checkpoint_close,wt_conn_btree_sync_and_close,wt_conn_dhandle_discard_single,wt_conn_dhandle_discard,wt_connection_close,_conn_close,mongo::WiredTigerKVEngine::cleanShutdown,shutdownServer,mongo::exitCleanly,mongo::(anonymous,boost::(anonymous,start_thread,clone |
| Comment by Mark Callaghan [ 16/Dec/14 ] |
|
When this command line is used to get LSM from WT, then the shutdown is fast (10 seconds): |