[SERVER-26753] Minor speed regression (13%) and 'choppy' performance in 3.4 vs 3.2 Created: 24/Oct/16 Updated: 23/Nov/16 Resolved: 07/Nov/16 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | WiredTiger |
| Affects Version/s: | 3.4.0-rc1 |
| Fix Version/s: | 3.2.11, 3.4.0-rc3 |
| Type: | Task | Priority: | Minor - P4 |
| Reporter: | Nick Judson | Assignee: | Michael Cahill (Inactive) |
| Resolution: | Done | Votes: | 0 |
| Labels: | code-only | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
| Backwards Compatibility: | Fully Compatible |
| Backport Completed: | |
| Sprint: | Storage 2016-11-21 |
| Participants: |
| Description |
|
OS is Win2012R2. When comparing my workload on the 3.4rc1 vs 3.2.10 there is a minor speed regression (13%) which appears to be mainly due to some intermittent activity (eviction?). I see the cpu spike to 100% every ~10-20 seconds with a corresponding speed drop. In 3.2, the cpu stays relatively flat. This is minor but thought you might like to know. |
| Comments |
| Comment by Githook User [ 07/Nov/16 ] |
|
Author: {u'name': u'Ramon Fernandez', u'email': u'ramon@mongodb.com'}Message: Import wiredtiger: 2e2d5fe23badac038944bdc6639eb62d77136f1a from branch mongodb-3.2 ref: b11ed312ce..2e2d5fe23b
|
| Comment by Michael Cahill (Inactive) [ 07/Nov/16 ] |
|
nick@innsenroute.com, thanks for the update: this is a little puzzling. We can see clearly in the stats that the read lock spinning has been resolved: in 3.2.9 your data shows occasional spikes in this statistic of 150M/s, in 3.2.10 there were many more spikes going up to 450M/s. With the patch applied, this stat never goes above 166K/s (i.e., 1000x lower). I'm not sure what to make of the lower performance: we don't track system statistics such as CPU and disk in MongoDB 3.2, so I can't correlate performance with CPU use from this data. I am going to close this ticket because I believe the underlying issue has been resolved by changes that will be in the next release of MongoDB. If you want more followup, please reopen or file a new ticket. |
| Comment by Githook User [ 07/Nov/16 ] |
|
Author: {u'username': u'michaelcahill', u'name': u'Michael Cahill', u'email': u'michael.cahill@mongodb.com'}Message: Import wiredtiger: 74430da40c96247c0a0d92f011ccfb2b175f1f32 from branch mongodb-3.4 ref: ef9a7983ea..74430da40c
|
| Comment by Nick Judson [ 04/Nov/16 ] |
|
...and after rebooting the MongoDb host, it appears to have reverted back to the behavior I'm used to... |
| Comment by Nick Judson [ 04/Nov/16 ] |
|
Must be something to do with my environment since I see a similar behavior in 3.2.9. Interesting to note that 3.2.9 still shows a better performance / lower cpu usage. I'm attaching the 3.2.9 run. |
| Comment by Nick Judson [ 04/Nov/16 ] |
|
Run from scratch using custom build. |
| Comment by Nick Judson [ 04/Nov/16 ] |
|
~Michael Cahill - I came in this morning to run another test and re-tried this test again (wiped everything clean this time). I don't know what has changed but I'm seeing a different behavior now with long blocks of 100% cpu usage. I'm going to attach some more diagnostic files for you to look at. So...this may not be completely resolved yet. |
| Comment by Githook User [ 04/Nov/16 ] |
|
Author: {u'username': u'michaelcahill', u'name': u'Michael Cahill', u'email': u'michael.cahill@mongodb.com'}Message: |
| Comment by Githook User [ 04/Nov/16 ] |
|
Author: {u'username': u'michaelcahill', u'name': u'Michael Cahill', u'email': u'michael.cahill@mongodb.com'}Message:
We could be starving a thread that we are waiting on of CPU.
|
| Comment by Githook User [ 04/Nov/16 ] |
|
Author: {u'username': u'michaelcahill', u'name': u'Michael Cahill', u'email': u'michael.cahill@mongodb.com'}Message:
We could be starving a thread that we are waiting on of CPU.
|
| Comment by Githook User [ 04/Nov/16 ] |
|
Author: {u'username': u'michaelcahill', u'name': u'Michael Cahill', u'email': u'michael.cahill@mongodb.com'}Message: |
| Comment by Githook User [ 04/Nov/16 ] |
|
Author: {u'username': u'michaelcahill', u'name': u'Michael Cahill', u'email': u'michael.cahill@mongodb.com'}Message:
We could be starving a thread that we are waiting on of CPU.
|
| Comment by Githook User [ 04/Nov/16 ] |
|
Author: {u'username': u'michaelcahill', u'name': u'Michael Cahill', u'email': u'michael.cahill@mongodb.com'}Message:
We could be starving a thread that we are waiting on of CPU.
|
| Comment by Githook User [ 04/Nov/16 ] |
|
Author: {u'username': u'michaelcahill', u'name': u'Michael Cahill', u'email': u'michael.cahill@mongodb.com'}Message: |
| Comment by Ramon Fernandez Marina [ 04/Nov/16 ] |
|
Hi nick@innsenroute.com, thanks for testing the patch build and glad to hear it addressed this issue. If you have other test cases where you see unexpected behavior please open a new SERVER ticket for each so we can take a look – and remember that all these issues are eligible for Bug Hunt prizes, so remember to set the "affects version/s" field to the specific release candidate you're using! Thanks, |
| Comment by Nick Judson [ 03/Nov/16 ] |
|
michael.cahill - yes, performance is flat as a pancake now. Thanks for looking into it! I have another test which simply uses the bulk insert api instead of single inserts (same test as above). This test appears to be really taxing on Mongo/WT and the performance is often worse than single-insert mode. Would you recommend posting this to JIRA or perhaps to the google groups instead? Thanks! |
| Comment by Michael Cahill (Inactive) [ 03/Nov/16 ] |
|
nick@innsenroute.com, there is a build here: https://evergreen.mongodb.com/version/5819599d3ff1221cb4000075 The Windows binaries are: Please let me know if this resolves the spiky performance issues. |
| Comment by Nick Judson [ 02/Nov/16 ] |
|
@Michael Cahill - sure. Send me a link and I'll try it out. Thanks! |
| Comment by Michael Cahill (Inactive) [ 02/Nov/16 ] |
|
nick@innsenroute.com, we've figured out at least part of what is going on here: WiredTiger was spinning in a tight loop under some circumstances, leading to CPU starvation. A change has been merged into the WiredTiger tree, and will now go through more testing before being merged into MongoDB. If you'd like to try out the change in the meantime to confirm that it improves things for you, please let me know and I'll prepare a build. |
| Comment by Githook User [ 02/Nov/16 ] |
|
Author: {u'username': u'michaelcahill', u'name': u'Michael Cahill', u'email': u'michael.cahill@mongodb.com'}Message:
We could be starving a thread that we are waiting on of CPU.
|
| Comment by Nick Judson [ 27/Oct/16 ] |
|
@Michael Cahill, I can set you up with my workload - I've sent you an email. |
| Comment by Michael Cahill (Inactive) [ 27/Oct/16 ] |
|
nick@innsenroute.com thanks for the updated data. What we see in the diagnostic data is that there's nothing obviously going wrong when these spikes occur: sometimes they correlate with checkpoints but sometimes they happen independent of checkpoints, operations aren't waiting on I/O, the cache isn't full, etc. There is a strong correlation with spikes in read-lock operations – in other words, it looks like we're spinning trying to get shared access to something. I have some ideas about what could be causing this because we made a change to the way we access the data structure representing the transaction table. If it's easy for me to run your workload, that would be great, otherwise we'll try to reproduce this ourselves. |
| Comment by Nick Judson [ 26/Oct/16 ] |
|
@Alexander Gorrod:
I can provide a copy of my workload, although I think someone over there already has it. |
| Comment by Nick Judson [ 26/Oct/16 ] |
|
Attaching the same run for 3.2.10. I'm seeing similar behavior in 3.2.10 as 3.4rc1, although perhaps the spikes are not happening quite as often. |
| Comment by Alexander Gorrod [ 26/Oct/16 ] |
|
Hi nick@innsenroute.com, thanks for the report and sorry about the performance regression in your workload. We have done some analysis on the data you uploaded, and the cause of the behavior difference isn't obvious. We did notice that the 3.2 statistics are running MongoDB 3.2.9, where you said it was running 3.2.10. There are significant changes to WiredTiger between 3.2.9 and 3.2.10 - could you re-run the test with 3.2.10 and let us know the results? It would also be useful if you can describe the configuration and workload, including:
We have made a lot of changes to how WiredTiger manages the cache recently, some of those changes are aimed at reducing the amount of memory allocator fragmentation we see (i.e: allocator overhead). It is good to know that the changes have had the desired affect. To explain why the usage falls below 3GB overall, WiredTiger attempts to maintain used cache at 80% of the configured maximum (2.4GB for your configuration), and I can see from the statistics that cache usage is being maintained close to that. The other 300MB you observe is used for allocations outside of WiredTiger and fragmentation in the allocator cache. |
| Comment by Daniel Pasette (Inactive) [ 24/Oct/16 ] |
|
Thanks for the report Nick. We'll take a look at the diagnostic data. |
| Comment by Nick Judson [ 24/Oct/16 ] |
|
I also noticed that 3.4 was using less memory. With WT capped at 3GB, it was sitting at ~2.7 vs ~3.2 for 3.2. Are you using the more aggressive decommit in 3.4? |