[SERVER-18788] wiredtiger 10x slower than mmap replica instance Created: 02/Jun/15 Updated: 15/Jun/15 Resolved: 15/Jun/15 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Performance |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | navin kumar | Assignee: | Sam Kleinman (Inactive) |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
|
||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Operating System: | Linux | ||||||||
| Participants: | |||||||||
| Description |
|
When we switch primaries to be the wiredTiger instances, while there For our homepage, https://www.shopcade.com/, to load for a logged in I've got only marginal improvements changing to the following configuration |
| Comments |
| Comment by Sam Kleinman (Inactive) [ 15/Jun/15 ] | ||||||||
|
Glad to hear that you resolved this issue! Cheers, | ||||||||
| Comment by navin kumar [ 15/Jun/15 ] | ||||||||
|
Problem solved! Notes:
Best, | ||||||||
| Comment by navin kumar [ 09/Jun/15 ] | ||||||||
|
Sam, Asya, I'm working on your inputs, and have varying results at this point. Will get back to you with performance data. Fingers crossed! | ||||||||
| Comment by Sam Kleinman (Inactive) [ 04/Jun/15 ] | ||||||||
|
Reviewing your logs, it looks like the MMAP is subject to similar kinds of slow queries: inefficient use of an index or collection scans, etc. as on WiredTiger. This is expected, and while this isn't the issue you're experiencing, I'm confident that you'll see improved and more consistent performance if you modify your indexes or queries. This is outside of the scope of this ticket, but the following documentation may help: Second, the data size you provided looks like is from the output of db.stats(), I just want to confirm that you've included a sum of all databases on this replica set. The main difference that I can see between these deployments is that the MMAP instance effectively has the whole machine's ram to use for in-memory cache, if the entire data set fits in RAM, as it looks like it does from these numbers. At the same time the WiredTiger instance is limited to 180 GB, and so there's more churn on the cache, which could explain the performance effects you're seeing. You could restart the wiredTiger instance with a larger cache (i.e. 230-240 GB) and see if that resolves the issue. If it turns out that this is a cache size issue; in the future, with better index utilization you may be able to reduce the cache size alloted to MongoDB. Third, it looks like your instance has a large number of open cursors, which can impact performance. From the serverStatus output that you include, it looks like your application opened about 10,000 curors with the noTimeout option at the top of the hour, and about 10,000 other cursors are timing out every hour. If you can modify your application to not open cursors with the noTimeout option, you may see improved performance. Shortening timeouts or closing cursors when the application is done with them may also help. As you test this, if you end up with a wiredTiger primary and MMAP secondaries, I might encourage you to increase the write concern: wiredTiger can typically handle a higher volume of writes than MMAP, and you should make sure that the secondaries can replicate operations fast enough. In general, we would expect a database server running wiredTiger to use more CPU resources (particularly with compression) than one running MMAP: as a result the differences in system load reporting may not indicate an issue. To summarize, I think we may be able to explain the performance effect you're seeing, may be attributable to the following factors:
I hope this helps. At this point I'm not inclined to think that there is a specific bug with MongoDB+WT. Please let us know if you still see a performance degradation with wiredTiger after modifying your application and deployment. Regards, | ||||||||
| Comment by Asya Kamsky [ 04/Jun/15 ] | ||||||||
|
I happened to notice that you seem to have a large number of cursors that don't get closed - there seem to be quite a few that time out after being idle for the timeout period, and it looks like you have a large number of "noTimeout" cursors accumulating after a while. Can you explain how you are using cursors in the application code? Asya | ||||||||
| Comment by navin kumar [ 03/Jun/15 ] | ||||||||
|
Sam,
no, on MMAP, there are hardly any slow queries. more details below
I've added it in the attached zip file
No, turning off compression was something I was testing - not sure it had any impact
1 - Acknowledged
each of our mongo servers have 244GiB of RAM. For shard s01: This morning, to generate more data, I switched the primary again to be wiredTiger. It took about half an hour for performance to drop for the same queries. I've attached details from this time below. Do have a look inside the mongo-query-times-compared directory. I am sorry to upload such large images, but I feel they best illustrate the problem. Hope this helps, and thank you very much. -----------------------
the following are query times for a page load:
more like the above, can ignore
| ||||||||
| Comment by navin kumar [ 02/Jun/15 ] | ||||||||
|
After turning off prefix compression, re-syncing data from scratch, and making wiredTiger primary, there was only about an hour of fast performance. That was at the time of my tweet. However query times quickly deteriorated after that. So I doubt prefix compression is the issue. I jumped the gun with that tweet, apologies. | ||||||||
| Comment by Keith Bostic (Inactive) [ 02/Jun/15 ] | ||||||||
|
Navin, your original tweet made me think prefix compression was an issue, was that based on measurements with/without index prefix compression configured? Or did that not have much of an effect? | ||||||||
| Comment by Sam Kleinman (Inactive) [ 02/Jun/15 ] | ||||||||
|
From looking over the logs I see a number of factors that might contribute to degraded performance. I'm not sure if these can, alone, explain the performance impact that you see, but it would be good to remove potential factors as we troubleshoot. First, I see some queries that are not able to use an index effectively, (e.g. high absolute nscannedObjects values, high ratios of nscannedObjects to nscanned, and scanAndOrder operations which do in-memory sorts.) For example:
Second there a large number of count operations that must inspect a many number of documents, and are hitting yields:
Thanks for your patience, and I hope we can get to the bottom of this quickly. Cheers, | ||||||||
| Comment by navin kumar [ 02/Jun/15 ] | ||||||||
|
before switching to the configuration, i had all values at their defaults. I've attached the mongo log for when wiredTiger was primary | ||||||||
| Comment by Ramon Fernandez Marina [ 02/Jun/15 ] | ||||||||
|
vain, since this issue pertains to the MongoDB server we've moved the ticket to the SERVER project. Do you have logs from the WiredTiger primary when serving those slow queries? What configuration where you using before switching to the one above? Thanks, |