[SERVER-29311] WT performance drops to zero during cache eviction Created: 21/May/17 Updated: 30/Nov/17 Resolved: 29/Jun/17 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | WiredTiger |
| Affects Version/s: | 3.2.11, 3.2.12 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Dom Dwyer | Assignee: | Kelsey Schubert |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
FreeBSD 11, sharded, replicated. |
||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Operating System: | ALL | ||||||||
| Steps To Reproduce: | Configure a large cache, perform many insert+update operations, measuring throughput. |
||||||||
| Participants: | |||||||||
| Description |
|
Hi all, We are currently benchmarking our workload against the following test environment:
Our synthetic workload consists of 450 workers performing an insert, waiting for 1 to 100ms and then updating the same document (adding roughly 1k bytes) querying by the _id field. We routinely see throughput drops to 0 for both inserts and updates. We've been collecting various stats while running the workload generator, and have correlated the drops to 0 throughput with a declining TrackedDirtyBytesInTheCache delta (measured every second using db.serverStatus()) which is what led us to believe it was related to the cache eviction. During periods of 0 throughput, the mongod boxes become CPU bound (40 core boxes) - this is what led us to conclude mongo was stalling as opposed to anything on application side. See the attached graph which is a plot of throughput vs. mongod CPU usage. After resizing the wiredtiger cache from 19GB to 1GB (with wiredTiger.engineConfig.cacheSizeGB=1) the throughput drops are much shorter in duration but still noticeable. We also tested setting eviction_dirty_trigger=10 and eviction_dirty_target=7 (with a 19GB cache) as suggested in https://jira.mongodb.org/browse/SERVER-28488 but we still see periods of 0 throughput. I have uploaded diagnostic data from a mongod instance previously under ticket #28488, and below are a couple of links to various stats we've recorded during various (different) testing runs in the hope they help. I can run tests and provide any info needed over the next couple of weeks. Sorry for the dupe ticket, but it has been over a month without a response. To clarify, our workload generator is the only application running on a 40 CPU core box, connected via a 1Gbps LAN to mongos - it performs no IO other than talking to mongo, it's sole purpose is to load test mongo. Monitoring the network interface, we never come remotely close to maxing the link, and we're nowhere near maxing the CPU of the host at any point - we are not I/O bound as claimed in the other ticket. CPU + disk IO stats: https://docs.google.com/a/itsallbroken.com/spreadsheets/d/1o6N23E_Nc1dBnFC_XXs6qWAXcdKAGjlgRuGHuVEXzn0/edit?usp=sharing db.serverStatus() + TrackedDirtyBytesInTheCache plot (switch sheets): https://docs.google.com/a/itsallbroken.com/spreadsheets/d/1tsK8vOun4IHyfhpA7mXQ9GrO9zneoWvL2dENj2EImug/edit?usp=sharing
|
| Comments |
| Comment by Kelsey Schubert [ 29/Jun/17 ] | |
|
I've examined your diagnostic.data and do not see a clear indication of a bug in WiredTiger. Please note that we are continuing to work to improve WiredTiger for an assortment of workloads and it's likely that you will benefit from some of these changes. For example, However, as described previously in this ticket, the root issue appears to be a configuration problem around an I/O bottleneck. Therefore, I would recommend posting on the mongodb-user group for MongoDB-related support. See also our Technical Support page for additional support resources. Kind regards, | |
| Comment by Wei Shan Ang [ 22/Jun/17 ] | |
|
Hi Thomas, We found this issue(https://jira.mongodb.org/browse/WT-2705) that seems similar to what we are experiencing as well. We observe the same behaviour of high latency and CPU utilisation during cache eviction in our test as well. Do you think it is the same issue? | |
| Comment by Dom Dwyer [ 20/Jun/17 ] | |
|
Hi Thomas, Thanks for the pointer regarding the slow ZFS writes, we had overlooked tuning our testing environment for the MongoDB workload (specifically the filesystem block size). Unfortunately even after tuning ZFS we're still seeing 0 throughput around ~18% of the time using the default eviction settings. When using no WT compression and the settings you suggest, we see 0 ops ~6% of the time but overall throughput is nearly halved and the 99th percentile update latency jumps to close to half a second - see #1. Same results when using snappy for compression too. The large write latency distribution and low throughput prevents us running with these settings in production. We've run benchmarks against all 6 combinations of compression settings (WT with zlib, snappy, and none; ZFS with lz4 on and off) across various filesystem block sizes and we've seen the issue occur in all of them. These tests were run with the default eviction trigger and eviction targets to try and isolate just the compression setting impact. Our raw data for these tests can be found at #2, though note some tests have double the sample count than others. It seems compression isn't a factor here, is there anything else we can try? Have you managed to reproduce this? I have the diagnostic.data from a run with your suggested eviction settings but it's too big to attach to this ticket (~200MB), is there somewhere else I should upload it to? If you need any of the raw benchmark logs or any other data please let me know. EDIT: https://itsallbroken.com/diagnostic.data_16K.tar #1: https://docs.google.com/spreadsheets/d/1JqOF3AvXKciEtHiTMtD9tgWzTzNqUIRyC0vMgpx5LKs/edit#gid=43245115 | |
| Comment by Kelsey Schubert [ 09/Jun/17 ] | |
|
It appears that writes are slow to ZFS. In the diagnostic.data, we see a larger number of active filesystem write calls this makes the single-threaded part of checkpoints slow, which in turn throttles application writes. One thing that contributes to this is using "blockCompressor: zlib" – my first suggestion is to switch to snappy. I'd also suggest testing your workload with the following settings:
This will reduce the amount of dirty data in cache meaning checkpoints complete faster, which should reduce the impact of slow I/O on your application. Would you please implement these changes and let us know if it resolves the issue? Kind regards, | |
| Comment by Wei Shan Ang [ 08/Jun/17 ] | |
|
Hello! Are there anything else you need from us in order to reproduce the issue at your end? WeiShan | |
| Comment by Dom Dwyer [ 24/May/17 ] | |
|
Hi all, Attached is a graph from the latest run against 3.2.13 (the diagnostic.data collection run). Inserts/s are measured from workload generator running on it's own box, the other stats are measured from the master of the replicated nodes. With 3.2.13 we see less periods of being CPU bound, though we still see throughput issues. They seem to fit with the cache size deltas changing still. I don't have the original data (DB engineer sent over the attached graph) but I'm sure I can get hold of a copy if you think it'll be of use. We can run tests if needed, but we will be using our test environment for other things in the coming weeks so it might take a few days. Dom | |
| Comment by Dom Dwyer [ 23/May/17 ] | |
|
Attached is the diagnostic data as promised. We can confirm we see the same issue with 3.2.13, however the periods of 0 throughput are less drastic. I've also attached a throughput log from the latest run. | |
| Comment by Kelsey Schubert [ 22/May/17 ] | |
|
Hi domodwyer, Please attach the diagnostic.data to this ticket. Thank you, | |
| Comment by Dom Dwyer [ 22/May/17 ] | |
|
Hi Michael, Thanks for the quick reply. We're going to push 3.2.13 to our staging environment and re-run the tests. We should have the diagnostic data for you tomorrow. Should I attach it here, or upload it to https://10gen-httpsupload.s3.amazonaws.com/upload_forms/423c6cab-ec2f-47a2-9df3-3bc45f922d2c.html ? Dom | |
| Comment by Michael Cahill (Inactive) [ 21/May/17 ] | |
|
domodwyer, thanks for the report, sorry you are seeing this issue. Can you please attach a tarball containing the diagnostic.data directory from a run of the workload? Also, if possible, can you please repeat with MongoDB 3.2.13, which avoids some fsync calls that were present in earlier versions and adds additional statistics that may help identify the issue? |