Uploaded image for project: 'WiredTiger'
  1. WiredTiger
  2. WT-1777

Degenerate case in wtperf run paging in

    Details

    • Type: Task
    • Status: Closed
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: None
    • Labels:

      Description

      While running log-append.wtperf on Jenkins wtperf effectively hung in __wt_page_in_func. Although it was running the code from WT-1774, I do not believe the log code is involved in this issue.

      The test is running on the Jenkins AWS box, a m2.4xlarge, equivalent to my AWS HDD. I ended up killing it after gathering some information.

      A normal test run completes in about 169 seconds and the test.stat file looks like this:

      1687998 populate inserts (9659745 of 50000000) in 5 secs (35 total secs)
      1998511 populate inserts (11658256 of 50000000) in 5 secs (40 total secs)
      1932159 populate inserts (13590415 of 50000000) in 5 secs (45 total secs)
      1986784 populate inserts (15577199 of 50000000) in 5 secs (50 total secs)
      

      This test run hit some kind of problem very quickly and never recovered. Its test.stat file looks like:

      157894 populate inserts (157894 of 50000000) in 5 secs (5 total secs)
      503 populate inserts (158397 of 50000000) in 5 secs (10 total secs)
      504 populate inserts (158901 of 50000000) in 5 secs (15 total secs)
      506 populate inserts (159407 of 50000000) in 5 secs (20 total secs)
      ...
      500 populate inserts (222988 of 50000000) in 5 secs (655 total secs)
      512 populate inserts (223500 of 50000000) in 5 secs (660 total secs)
      504 populate inserts (224004 of 50000000) in 5 secs (665 total secs)
      

      After more than 10 minutes it had only done 224k of 50M operations. We are getting 100 operations per second!

      I ran pmp and it shows this every time. Our 8 populate threads are sleeping and nothing else is going on:

            8 select,__wt_sleep,__wt_page_in_func,__wt_row_search,__wt_btcur_insert,__curfile_insert,populate_thread,start_thread,clone
            2 
            1 select,__wt_sleep,__evict_server,start_thread,clone
            1 pthread_cond_wait@@GLIBC_2.3.2,__wt_cond_wait,__ckpt_server,start_thread,clone
            1 pthread_cond_timedwait@@GLIBC_2.3.2,__wt_cond_wait,__sweep_server,start_thread,clone
            1 pthread_cond_timedwait@@GLIBC_2.3.2,__wt_cond_wait,__log_wrlsn_server,start_thread,clone
            1 pthread_cond_timedwait@@GLIBC_2.3.2,__wt_cond_wait,__log_server,start_thread,clone
            1 pthread_cond_timedwait@@GLIBC_2.3.2,__wt_cond_wait,__log_close_server,start_thread,clone
      

      The AWS console charts show virtually no I/O going on during this test's run time. It does show significant I/O at the times of the other runs of this test.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                sue.loverso Sue LoVerso
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: