[SERVER-32154] Unbounded lookaside table growth during insert workload Created: 03/Dec/17 Updated: 30/Oct/23 Resolved: 08/Aug/18 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Storage |
| Affects Version/s: | 3.6.0-rc8 |
| Fix Version/s: | 3.6.6, 4.0.0 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Bruce Lucas (Inactive) | Assignee: | Backlog - Storage Execution Team |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||||||||||||||||||||||
| Issue Links: |
|
||||||||||||||||||||||||||||
| Assigned Teams: |
Storage Execution
|
||||||||||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||||||
| Case: | (copied to CRM) | ||||||||||||||||||||||||||||
| Description |
|
1-node repl set, SSD, 50 GB oplog, 4 GB cache, 5 threads inserting 130 kB docs.
It appears that the lookaside table, both number of entries and size of WiredTigerLAS.wt ("csv size" metric), grows without bound during a heavy write workload. Even after the workload is stopped the table doesn't shrink. FTDC data attached. |
| Comments |
| Comment by Bruce Lucas (Inactive) [ 08/Aug/18 ] | |||||||||||||||||||||||||
|
Yes, this workload is much better behaved now on 3.6.6. Some of the work was done in earlier tickets, but that is the max fix version on the linked tickets, so I think we should mark this as fixed in 3.6.6. | |||||||||||||||||||||||||
| Comment by Kelsey Schubert [ 07/Aug/18 ] | |||||||||||||||||||||||||
|
bruce.lucas, has this issue been resolved or does there need to be additional work that needs to be done? | |||||||||||||||||||||||||
| Comment by Michael Cahill (Inactive) [ 25/Feb/18 ] | |||||||||||||||||||||||||
|
I've opened | |||||||||||||||||||||||||
| Comment by Ian Whalen (Inactive) [ 23/Feb/18 ] | |||||||||||||||||||||||||
|
OK, I've added the link to | |||||||||||||||||||||||||
| Comment by Bruce Lucas (Inactive) [ 23/Feb/18 ] | |||||||||||||||||||||||||
|
I think this ticket needs both | |||||||||||||||||||||||||
| Comment by Ian Whalen (Inactive) [ 23/Feb/18 ] | |||||||||||||||||||||||||
|
Marking as fixversion=Backlog because there are no plans to work on this ticket directly and it will instead be resolved with the work on | |||||||||||||||||||||||||
| Comment by Bruce Lucas (Inactive) [ 10/Feb/18 ] | |||||||||||||||||||||||||
|
michael.cahill, following up on our conversation yesterday, I'm still seeing the same shutdown behavior in 3.6.2 regarding lookaside table on shutdown - shutdown takes a very long time and uses a very large amount of memory, many times the size of the cache. | |||||||||||||||||||||||||
| Comment by Bruce Lucas (Inactive) [ 21/Dec/17 ] | |||||||||||||||||||||||||
|
Thanks Alex, good to hear confirmation that the tickets you mention will fix this. I'm recording the symptoms I'm seeing here to help in diagnosis pending those fixes. I'll also note that the shutdown process that you describe is extremely memory-intensive. Example:
| |||||||||||||||||||||||||
| Comment by Alexander Gorrod [ 21/Dec/17 ] | |||||||||||||||||||||||||
|
bruce.lucas The reason it takes a long time is that while the oldest ID is pinned we need to write data into the lookaside file instead of the data table. When we then do the shutdown, and the ID is no longer pinned that content is read from the lookaside file and written into the data file. There isn't much we can do to avoid that read. It would be possible to skip that step, and instead recover via the oplog from further back in time, but I expect that would make startup time much longer - so I don't think it'd be a net win. On the other hand the changes in | |||||||||||||||||||||||||
| Comment by Bruce Lucas (Inactive) [ 20/Dec/17 ] | |||||||||||||||||||||||||
|
Here's an example:
A subsequent shutdown of the primary (not shown) took ~45 seconds. In a larger example with a larger lag shutdown took many minutes. Stack traces like the following indicating LAS activity during shutdown:
| |||||||||||||||||||||||||
| Comment by Bruce Lucas (Inactive) [ 20/Dec/17 ] | |||||||||||||||||||||||||
|
One consequence of this that I've observed is that if there has been substantial lookaside activity, for example because replication lag during a write-heavy workload has created cache pressure due to the requirement to keep uncommited updates in cache on the primary, then normal shutdown of the node can take a very long time related to lookaside table cleanup (I think, based on collected stack traces). | |||||||||||||||||||||||||
| Comment by Alexander Gorrod [ 15/Dec/17 ] | |||||||||||||||||||||||||
|
I've reproduced this behavior and have a few observations, but no clear conclusion yet: There is no obvious correlation between checkpoints or other periodic activity and lookaside writes. My suspicion is that this workload overwhelms IO. I've been running with the journal and data files on the same drive. I think a journal flush is probably taking a long time due to the disk saturation, which means the oldest timestamp isn't being updated for several seconds, which in turn leads to lookaside usage as we try to evict pages with history that's required to satisfy read concern majority. I've attempted to capture a set of statistics that reflect my above observations: There are some things I'd like to understand:
| |||||||||||||||||||||||||
| Comment by Alexander Gorrod [ 04/Dec/17 ] | |||||||||||||||||||||||||
|
bruce.lucas I believe you are noticing an absence of The way lookaside (cache overflow) works is that if we need to reclaim space from cache, we write that content into the lookaside file. The content in the lookaside file is then removed the next time that data is accessed or when a table is dropped. It's not surprising to me that a write only workload will show lookaside file growth, since there are no queries causing WiredTiger to remove redundant content from the lookaside file. Implementing an alternative mechanism for reclaiming that space is the goal of I believe this particular workload could use the lookaside file less - but that's likely going to be covered by work in |