Description
@agorrod, @michaelcahill: there's a stall in the new-split branch. I was hoping Michael's WT-931 would fix it, but I can still reproduce the problem. Here's the config I'm using, and the more threads, the sooner it fires:
file_type=row
|
data_source=file
|
checkpoints=1
|
cache=5
|
compression=none
|
leaf_page_max=12
|
internal_page_max=12
|
ops=1000000
|
rows=1000
|
key_max=32
|
value_max=32
|
and we end up with checkpoint in an infinite loop walking the tree:
#0 __wt_tree_walk (session=0x8024ff180, pagep=0x7ffffddee9d8, flags=320)
|
at ../src/btree/bt_walk.c:317
|
WT-1 0x0000000000446afd in __wt_sync_file (session=0x8024ff180, syncop=8)
|
at ../src/btree/bt_evict.c:655
|
WT-2 0x0000000000457477 in __wt_bt_cache_op (session=0x8024ff180,
|
ckptbase=0x8062fe400, op=8) at ../src/btree/bt_sync.c:59
|
WT-3 0x00000000004403bc in __checkpoint_worker (session=0x8024ff180,
|
cfg=0x7ffffddeede0, is_checkpoint=1) at ../src/txn/txn_ckpt.c:750
|
It looks to me like checkpoint is looping between two pages: the "couple" page and the next page (which is a WT_REF_SPLIT page). Checkpoint reads the split page, gets a WT_RESTART return, returns to the "couple" page, does a next, and winds up on the split page again.
I can reproduce the problem even without deepening the tree, so this is a fundamental issue in splitting (maybe an eviction race with checkpoint, maybe a race inside split itself).