Details
-
Task
-
Status: Closed
-
Resolution: Fixed
-
None
-
None
Description
I've been profiling a workload that runs regular checkpoints while pounding away with reads and inserts. The workload runs for a long time, has 20 threads doing inserts, 20 threads doing reads and checkpoints running every 20 seconds.
After the system has been running for a while, I get to the point where it takes between 500 and 1000 seconds for each checkpoint to complete.
When I look at stack traces I see quite a lot of waiting on the block->live_lock. I've included a pmp output that includes all threads that are in the block subsystem below. I've realized that I don't understand much about that code, would you mind walking me through the code on the phone next week?
At first I wondered whether the skip lists were growing large enough that operations were slowing down, but it doesn't look like it. I added the following change:
--- a/src/block/block_ckpt.c
|
+++ b/src/block/block_ckpt.c
|
@@ -542,6 +542,10 @@ __ckpt_process(
|
|
live_update:
|
ci = &block->live;
|
+ fprintf(stderr, "block_checkpoint lists:\n\t avail entries: %" PRIu64
|
+ "\n\tdiscard entries: %" PRIu64
|
+ "\n\talloc entries: %" PRIu64 "\n",
|
+ ci->avail.entries, ci->discard.entries, ci->alloc.entries);
|
|
/* Truncate the file if that's possible. */
|
WT_ERR(__wt_block_extlist_truncate(session, block, &ci->avail));
|
The outputs from it are:
block_checkpoint lists:
|
avail entries: 1
|
discard entries: 0
|
alloc entries: 1864
|
block_checkpoint lists:
|
avail entries: 1
|
discard entries: 0
|
alloc entries: 1
|
block_checkpoint lists:
|
avail entries: 0
|
discard entries: 0
|
alloc entries: 215950
|
block_checkpoint lists:
|
avail entries: 1
|
discard entries: 0
|
alloc entries: 2
|
block_checkpoint lists:
|
avail entries: 0
|
discard entries: 0
|
alloc entries: 446437
|
block_checkpoint lists:
|
avail entries: 1
|
discard entries: 0
|
alloc entries: 1
|
So.. The skip lists are growing, but they shouldn't contain enough elements to cause our depth of 10 to be a problem.
6 __lll_lock_wait,_L_lock_927,__pthread_mutex_lock,__wt_spin_lock,__wt_block_write_off,__wt_block_write,__wt_bt_write,__rec_split_write,__rec_split_finish,__rec_split_finish,__rec_row_leaf,__wt_rec_write,__rec_review,__wt_rec_evict,__wt_evict_page,__wt_evict_lru_page,__wt_cache_full_check,__cursor_enter,__curfile_enter,__cursor_func_init,__wt_btcur_search,__curfile_search,worker,start_thread,clone
|
|
4 pwrite64,__wt_write,__wt_block_write_off,__wt_block_write,__wt_bt_write,__rec_split_write,__rec_split_finish,__rec_split_finish,__rec_row_leaf,__wt_rec_write,__rec_review,__wt_rec_evict,__wt_evict_page,__wt_evict_lru_page,__wt_cache_full_check,__cursor_enter,__curfile_enter,__cursor_func_init,__wt_btcur_insert,__curfile_insert,worker,start_thread,clone
|
|
4 __lll_lock_wait,_L_lock_927,__pthread_mutex_lock,__wt_spin_lock,__wt_block_free,__rec_write_wrapup,__wt_rec_write,__rec_review,__wt_rec_evict,__wt_evict_page,__wt_evict_lru_page,__wt_cache_full_check,__cursor_enter,__curfile_enter,__cursor_func_init,__wt_btcur_search,__curfile_search,worker,start_thread,clone
|
|
3 __lll_lock_wait,_L_lock_927,__pthread_mutex_lock,__wt_spin_lock,__wt_block_free,__rec_write_wrapup,__wt_rec_write,__rec_review,__wt_rec_evict,__wt_evict_page,__wt_evict_lru_page,__wt_cache_full_check,__cursor_enter,__curfile_enter,__cursor_func_init,__wt_btcur_insert,__curfile_insert,worker,start_thread,clone
|
|
2 pread64,__wt_read,__wt_block_read_off,__wt_block_read_off,__wt_bm_read,__wt_bt_read,__wt_cache_read,__wt_page_in_func,__wt_page_swap_func,__wt_row_search,__wt_btcur_insert,__curfile_insert,worker,start_thread,clone
|
|
2 __lll_lock_wait,_L_lock_927,__pthread_mutex_lock,__wt_spin_lock,__wt_block_write_off,__wt_block_write,__wt_bt_write,__rec_split_write,__rec_split_finish,__rec_split_finish,__rec_row_leaf,__wt_rec_write,__rec_review,__wt_rec_evict,__wt_evict_page,__wt_evict_lru_page,__wt_cache_full_check,__cursor_enter,__curfile_enter,__cursor_func_init,__wt_btcur_insert,__curfile_insert,worker,start_thread,clone
|
|
1 __lll_lock_wait,_L_lock_927,__pthread_mutex_lock,__wt_spin_lock,__wt_block_write_off,__wt_block_write,__wt_bt_write,__rec_split_write,__rec_split_fixup,__rec_split,__rec_row_leaf,__wt_rec_write,__rec_review,__wt_rec_evict,__wt_evict_page,__wt_evict_lru_page,__wt_cache_full_check,__cursor_enter,__curfile_enter,__cursor_func_init,__wt_btcur_search,__curfile_search,worker,start_thread,clone
|
|
1 __lll_lock_wait,_L_lock_927,__pthread_mutex_lock,__wt_spin_lock,__wt_block_free,__rec_write_wrapup,__wt_rec_write,__wt_sync_file,__wt_bt_cache_op,__wt_conn_btree_apply,__checkpoint_apply,__wt_txn_checkpoint,__session_checkpoint,checkpoint_worker,start_thread,clone
|
|
1 __lll_lock_wait,_L_lock_927,__pthread_mutex_lock,__wt_spin_lock,__wt_block_free,__rec_write_wrapup,__wt_rec_write,__rec_review,__wt_rec_evict,__wt_evict_page,__wt_evict_lru_page,__wt_cache_full_check,__wt_page_in_func,__wt_page_swap_func,__wt_row_search,__wt_btcur_insert,__curfile_insert,worker,start_thread,clone
|
|
1 __lll_lock_wait,_L_lock_927,__pthread_mutex_lock,__wt_spin_lock,__wt_block_free,__rec_write_wrapup,__wt_rec_write,__rec_review,__wt_rec_evict,__wt_evict_page,__wt_evict_lru_page,__evict_lru,__evict_worker,__wt_cache_evict_server,start_thread,clone
|
|
1 __block_off_srch,__block_ext_insert,__wt_block_off_remove_overlap,__wt_block_off_free,__wt_block_free,__rec_write_wrapup,__wt_rec_write,__rec_review,__wt_rec_evict,__wt_evict_page,__wt_evict_lru_page,__wt_cache_full_check,__cursor_enter,__curfile_enter,__cursor_func_init,__wt_btcur_search,__curfile_search,worker,start_thread,clone
|
For completeness, the wtperf workload I'm running is:
# wtperf options file: small btree multi-database configuration
|
# Original cache was 500MB. Shared cache is 500MB * database_count.
|
conn_config="statistics=[fast,clear],statistics_log=(wait=20),cache_size=10GB,log=(enabled=false),checkpoint_sync=false"
|
database_count=1
|
table_count=1
|
table_config="os_cache_dirty_max=200MB,leaf_page_max=4k,internal_page_max=16k,leaf_item_max=1433,internal_item_max=3100,type=file"
|
# Likewise, divide original icount by database_count.
|
icount=50000
|
populate_threads=1
|
random_range=100000000
|
checkpoint_interval=20
|
checkpoint_threads=1
|
report_interval=5
|
run_time=2400
|
threads=((count=20,reads=1),(count=20,inserts=1))
|
value_sz=250
|
verbose=1
|
warmup=20
|
Based on the develop branch (I was seeing similar behavior on new-split).
Attachments
Issue Links
- is related to
-
WT-947 Delay freeing the checkpoint's alloc/discard lists until the resolution step
- Closed
-
WT-954 Checkpoint optimizations
- Closed
- related to
-
WT-1 placeholder WT-1
- Closed
-
WT-2 What does metadata look like?
- Closed
-
WT-3 What file formats are required?
- Closed
-
WT-4 Flexible cursor traversals
- Closed
-
WT-5 How does pget work: is it necessary?
- Closed
-
WT-6 Complex schema example
- Closed
-
WT-7 Do we need the handle->err/errx methods?
- Closed
-
WT-8 Do we need table load, bulk-load and/or dump methods?
- Closed
-
WT-9 Does adding schema need to be transactional?
- Closed
-
WT-10 Basic "getting started" tutorial
- Closed
-
WT-11 placeholder #11
- Closed
-
WT-12 Write more examples
- Closed
-
WT-13 Define supported platforms
- Closed
-
WT-14 Windows build
- Closed
-
WT-15 Automated build/test infrastructure
- Closed
-
WT-16 Test suite
- Closed
-
WT-929 Make extent list management more efficient during checkpoints
- Closed