Loading...

XML

Word

Printable

JSON

Type: Task
Resolution: Done
Fix Version/s: None
Affects Version/s: None
Component/s: None
Labels:
- Bug
- Performance

Sprint:
None
Story Points:
None

While running log-append.wtperf on Jenkins wtperf effectively hung in __wt_page_in_func. Although it was running the code from ~~WT-1774~~, I do not believe the log code is involved in this issue.

The test is running on the Jenkins AWS box, a m2.4xlarge, equivalent to my AWS HDD. I ended up killing it after gathering some information.

A normal test run completes in about 169 seconds and the test.stat file looks like this:

1687998 populate inserts (9659745 of 50000000) in 5 secs (35 total secs)
1998511 populate inserts (11658256 of 50000000) in 5 secs (40 total secs)
1932159 populate inserts (13590415 of 50000000) in 5 secs (45 total secs)
1986784 populate inserts (15577199 of 50000000) in 5 secs (50 total secs)

This test run hit some kind of problem very quickly and never recovered. Its test.stat file looks like:

157894 populate inserts (157894 of 50000000) in 5 secs (5 total secs)
503 populate inserts (158397 of 50000000) in 5 secs (10 total secs)
504 populate inserts (158901 of 50000000) in 5 secs (15 total secs)
506 populate inserts (159407 of 50000000) in 5 secs (20 total secs)
...
500 populate inserts (222988 of 50000000) in 5 secs (655 total secs)
512 populate inserts (223500 of 50000000) in 5 secs (660 total secs)
504 populate inserts (224004 of 50000000) in 5 secs (665 total secs)

After more than 10 minutes it had only done 224k of 50M operations. We are getting 100 operations per second!

I ran pmp and it shows this every time. Our 8 populate threads are sleeping and nothing else is going on:

      8 select,__wt_sleep,__wt_page_in_func,__wt_row_search,__wt_btcur_insert,__curfile_insert,populate_thread,start_thread,clone
      2 
      1 select,__wt_sleep,__evict_server,start_thread,clone
      1 pthread_cond_wait@@GLIBC_2.3.2,__wt_cond_wait,__ckpt_server,start_thread,clone
      1 pthread_cond_timedwait@@GLIBC_2.3.2,__wt_cond_wait,__sweep_server,start_thread,clone
      1 pthread_cond_timedwait@@GLIBC_2.3.2,__wt_cond_wait,__log_wrlsn_server,start_thread,clone
      1 pthread_cond_timedwait@@GLIBC_2.3.2,__wt_cond_wait,__log_server,start_thread,clone
      1 pthread_cond_timedwait@@GLIBC_2.3.2,__wt_cond_wait,__log_close_server,start_thread,clone

The AWS console charts show virtually no I/O going on during this test's run time. It does show significant I/O at the times of the other runs of this test.

related to

WT-1774 Change log alignment to 32 bytes. WT-1772

Closed

WT-1787 Update the eviction server to not set stuck if it isn't full.

Closed

WT-1788 Review eviction server algorithm

Closed

Assignee:: Unassigned
Reporter:: Susan LoVerso (Inactive)
Votes:: 0 Vote for this issue
Watchers:: 1 Start watching this issue

Created:: Mar 18 2015 05:52:34 PM UTC
Updated:: Apr 16 2015 09:18:43 PM UTC
Resolved:: Apr 09 2015 01:41:40 AM UTC

Details

Description

Attachments

Issue Links

Activity

People

Dates