Uploaded image for project: 'WiredTiger'
  1. WiredTiger
  2. WT-852

wtperf with a small "test1-like" config hangs

    • Type: Icon: Task Task
    • Resolution: Done
    • WT2.1
    • Affects Version/s: None
    • Component/s: None
    • Labels:
      None

      I'm working on getting wtperf to run in a riak-like configuration so that I can look at the issues we've been seeing there without a lot of layers. I created a tiny version of test1. Really, that means I added conn_config and table_config values that represent what we do in riak, and modeled the key and value size on basho_bench test1 (40 byte keys and 1000 byte values in this case).

      Here's the wtperf config file. I modeled it as something that is 1% as many entries (5M instead of 500M), 10% populate threads (10 versus 100) and 25% the cache (5Gb instead of 21Gb). This config hangs on the AWS SSD box before completing, I'm guess when the cache fills up.

      This was a tiny configuration as a sanity check in anticipation of running one that is a full test1 of 500M entries.

      conn_config="cache_size=5G,checkpoint_sync=false,mmap=false,session_max=1024"
      table_config="internal_page_max=128K,lsm=(bloom_config=(leaf_page_max=8MB),bloom_bit_count=28,bloom_hash_count=19,bloom_oldest=true,chunk_size=100MB,merge_threads=2),type=lsm"
      icount=5000000
      populate_threads=10
      key_sz=40
      value_sz=1000
      report_interval=5
      

      Here's the pmp output from the hang:

           10 pthread_cond_timedwait@@GLIBC_2.3.2,__wt_cond_wait,__wt_cache_full_check,__clsm_enter,__clsm_insert,populate_thread,start_thread,clone
            2 pthread_cond_timedwait@@GLIBC_2.3.2,__wt_cond_wait,__wt_lsm_merge_worker,start_thread,clone
            2 
            1 pthread_cond_timedwait@@GLIBC_2.3.2,__wt_cond_wait,__wt_cache_evict_server,start_thread,clone
            1 nanosleep,usleep,execute_populate,main
            1 __memcmp_sse4_1,__wt_ovfl_reuse_search,__rec_cell_build_ovfl,__rec_cell_build_val,__rec_row_leaf_insert,__rec_row_leaf,__wt_rec_write,__wt_sync_file,__wt_bt_cache_op,__wt_lsm_checkpoint_worker,start_thread,clone
            1 __lll_lock_wait,_L_lock_927,pthread_mutex_lock,__wt_spin_lock,__wt_conn_btree_sync_and_close,__wt_session_release_btree,__curbulk_close,__wt_bloom_finalize,__lsm_bloom_create,__lsm_bloom_work,__wt_lsm_merge_worker,start_thread,clone
      

      The merge thread is waiting on the checkpoint lock, which presumably the checkpoint thread is holding. (With a smaller cache, this hangs much quicker in the same way.)

            Assignee:
            sue.loverso@mongodb.com Susan LoVerso
            Reporter:
            sue.loverso@mongodb.com Susan LoVerso
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated:
              Resolved: