Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Done
Priority: Major - P3
Fix Version/s: WT2.7.0
Affects Version/s: None
Component/s: None
Labels:
None

Sprint:
None
Story Points:
None

Improvements to log slot freeing to improve thread scalability

Investigated the negative scaling of writeahead log seen in ~~SERVER-18908~~ and ~~SERVER-19189~~. Found two issues; experimental patch that appears to address both attached.

Threads are often waiting because there are no FREE slots. Slots are freed by the __log_wrlsn_server. However because it is done asyncronously there may be unnecessary delay in freeing slots for a couple reason: if there is thread contention __log_wrlsn_server may not get scheduled; it uses yields and sleeps so it may not notice when slots become freeable; and because the thread waiting for a FREE slot in __wt_log_slot_close is also using yields and sleeps, it may not notice right away when a slot is freed. The patch addresses this issue by pulling the slot-freeing logic from the loop of __log_wrlsn_server out into a function __log_wrlsn which is then called from __wt_log_slot_close when it has scanned all the slots and not found a FREE one. This call is made with the log_slot_lock held for thread-safety, but that's ok because at that point any thread that would have entered that lock would have become stuck anyway due to lack of FREE slots.

By adding some messages to the code I noticed that often when threads were stuck in __wt_log_slot_close waiting for a FREE slot there were many WRITTEN slots but no FREE slots because the oldest slot was not yet WRITTEN (either because it was waiting for i/o to complete, or actually more often was waiting for all threads that had joined the slot to copy their data into the buffer and transition the slot to DONE - presumably because one of the threads that had to do so was held up by contention.) In other words slots were like this:
```
SLOT: start_lsn=1000, end_lsn=2000, state<DONE (i.e. threads copying data into the slot buffer)
SLOT: start_lsn=2000, end_lsn=3000, state=WRITTEN (i.e. slot has been written to disk and is now waiting to be freed)
SLOT: start_lsn=3000, end_lsn=4000, state=WRITTEN (i.e. slot has been written to disk and is now waiting to be freed)
SLOT: start_lsn=4000, end_lsn=5000, state=WRITTEN (i.e. slot has been written to disk and is now waiting to be freed)
SLOT: start_lsn=5000, end_lsn=6000, state=WRITTEN (i.e. slot has been written to disk and is now waiting to be freed)
```
As I understand the algorithm the only purpose of the WRITTEN slots is to keep track of holes in the log file (for example, 1000-2000 in the example above) so we can correctly advance the LSN - is that right? However they aren't doing so very efficiently - the same information could be recorded by coalescing the WRITTEN slots into a single one (more specifically, one for each hole in the log file), making the other slots FREE, like so:
```
SLOT: start_lsn=1000, end_lsn=2000, state<DONE (i.e. threads copying data into the slot buffer)
SLOT: start_lsn=2000, end_lsn=6000, state=WRITTEN (i.e. slot has been written to disk and is now waiting to be freed)
SLOT: state=FREE
SLOT: state=FREE
SLOT: state=FREE
```

Attached patch is a POC-level implementation of the above. Some performance numbers, for n mongod client threads doing inserts of tiny documents in 10k batches into a standalone mongod server on a machine with 12 cores (24 cpus):

threads    3.0.4        3.0.4
                        +WTlog.patch

 8        278401        280608
16        379076        405451
24        232358        407481
32        158440        334523
48        125652        246961
64        118095        220157

performance with a large number of threads has been about doubled
there is still some negative scaling at large thread counts, so maybe there are additional bottlenecks to be addressed

So this seems good from a performance perspective, at least on this test. Have not done any functional testing on it. michael.cahill, sue.loverso, can you take a look and see if this makes sense to you?

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

demo.c
14 kB
Jul 20 2015 11:47:25 AM UTC
maxused.patch
2 kB
Jul 07 2015 07:16:43 PM UTC
repro.sh
0.8 kB
Jul 07 2015 12:26:10 PM UTC
WTlog.patch
8 kB
Jul 02 2015 06:09:43 PM UTC
WTlog2.patch
10 kB
Jul 03 2015 05:35:48 PM UTC

is depended on by

SERVER-18908 Secondaries unable to keep up with primary under WiredTiger

Closed

SERVER-19282 WiredTiger changes in MongoDB 3.1.6

Closed

SERVER-19283 WiredTiger changes for MongoDB 3.0.5

Closed

SERVER-19189 Improve performance under high number of threads with WT

Closed

SERVER-19532 WiredTiger changes for MongoDB 3.1.7

Closed

SERVER-19744 WiredTiger changes for MongoDB 3.0.6

Closed

(1 is depended on by)

Buffer log records in memory to improve performance

WT-2031

Closed

Susan LoVerso (Inactive)

WT2.7.0

Assignee:: Susan LoVerso (Inactive)
Reporter:: Bruce Lucas (Inactive)
Votes:: 0 Vote for this issue
Watchers:: 16 Start watching this issue

Created:: Jul 02 2015 06:09:43 PM UTC
Updated:: Oct 12 2017 11:19:24 PM UTC
Resolved:: Sep 10 2015 01:25:47 AM UTC

Details

Description

Attachments

Attachments

Issue Links

Sub-Tasks

Activity

People

Dates