[SERVER-12878] Aggregations that do not return cursors need to clean up internal cursors Created: 25/Feb/14 Updated: 11/Jul/16 Resolved: 05/Mar/14 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Performance |
| Affects Version/s: | 2.6.0-rc0 |
| Fix Version/s: | 2.6.0-rc1 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Tim Callaghan | Assignee: | Mathias Stearn |
| Resolution: | Done | Votes: | 0 |
| Labels: | performance | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Ubuntu 13.10 |
||
| Attachments: |
|
||||||||||||||||
| Issue Links: |
|
||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||
| Operating System: | Linux | ||||||||||||||||
| Steps To Reproduce: |
The benchmark runs in two stages: Both stages output their respective performance.
|
||||||||||||||||
| Participants: | |||||||||||||||||
| Description |
|
Aggregation commands that do not return a cursor need to release the ClientCursor object that they create to do their work. As of 2.6.0-rc0, they do not. This leads to increasing workload for every update and delete operation, as they must scan those zombie cursors and inform them of certain relevant events. Original description / steps to reproduce: In running benchmarks to compare the performance of 2.4.9 vs. 2.6.0.rc0 I see a significant drop in my Sysbench workload. The performance of both versions is comparable for loading data (insert only), but in the mixed workload phase (reads, aggregation, update, delete, insert) the performance of 2.6.0.rc0 is 50x lower on an in-memory test. The performance drop is similar for > RAM testing as well. I have confirmed this on my desktop (single Corei7) and a server (dual Xeon 5560), both running Ubuntu 13.10, with read-ahead set to 32. I'm running with the Java driver version 2.11.2 |
| Comments |
| Comment by Githook User [ 19/Mar/14 ] | ||||||||||||||||||||
|
Author: {u'username': u'hkhalsa', u'name': u'Hari Khalsa', u'email': u'hkhalsa@10gen.com'}Message: | ||||||||||||||||||||
| Comment by Githook User [ 19/Mar/14 ] | ||||||||||||||||||||
|
Author: {u'username': u'hkhalsa', u'name': u'Hari Khalsa', u'email': u'hkhalsa@10gen.com'}Message: | ||||||||||||||||||||
| Comment by Tim Callaghan [ 05/Mar/14 ] | ||||||||||||||||||||
|
Andy, I agree, and I think that's what Hari originally did for this ticket in this commit. https://github.com/mongodb/mongo/commit/a3d910925350d2f0204b41ea145e24f74e5c39ce | ||||||||||||||||||||
| Comment by Githook User [ 05/Mar/14 ] | ||||||||||||||||||||
|
Author: {u'username': u'RedBeard0531', u'name': u'Mathias Stearn', u'email': u'mathias@10gen.com'}Message: | ||||||||||||||||||||
| Comment by Andy Schwerin [ 05/Mar/14 ] | ||||||||||||||||||||
|
Tim, someone must have done something to improve the query planning problem you were seeing since RC0, if so. Maybe hari.khalsa@10gen.com's For reference this is what I see in 2.4.9 for the last hundred seconds on an Intel Core i7-2600 CPU @ 3.40GHz, with 16GB of RAM and data stored on a passable Intel SSD:
And this is what I see for 2.6.0-RC1-pre-:
I assume from all of the results (yours and mine) that the last 10 seconds might see some amount of cool-down behavior. I've typically been ignoring it. | ||||||||||||||||||||
| Comment by Tim Callaghan [ 05/Mar/14 ] | ||||||||||||||||||||
|
Andy, no problem at all, although my original ticket name is likely appropriate as I think there was also a big issue with query planning within the workload, and your new discovery is yet another issue that my benchmark uncovered. | ||||||||||||||||||||
| Comment by Andy Schwerin [ 05/Mar/14 ] | ||||||||||||||||||||
|
tmcallaghan, thanks for the excellent repro. I've updated the ticket summary and description to describe the underlying bug that you unmasked. | ||||||||||||||||||||
| Comment by Andy Schwerin [ 05/Mar/14 ] | ||||||||||||||||||||
|
I did a differential profile, comparing the first 30 seconds of the sysbench (second) stage, and the penultimate 30 seconds in 2.6.0-rc1. The good news is that RC1 is substantially faster than RC0, but it still shows the performance decay pattern that tmcallaghan found, and still has a 7x performance degradation over the life of the benchmark. The attached SVG files show call graphs weighted by the number of samples associated with each node. Note that CollectionCursorCache::invalidateDocument goes from around 14% of execution time to almost 60% of execution time over the life of the benchmark, and a substantial amount of time in any event is spent in shared pointer atomic arithmetic and red-black tree traversal. I will investigate further. There's a reasonable likelihood of a resource leak in the client cursor cache, or a behavior change from 2.4 about cursor expiration. | ||||||||||||||||||||
| Comment by Daniel Pasette (Inactive) [ 27/Feb/14 ] | ||||||||||||||||||||
|
Yep, that's why the ticket is still open. We're still looking at it. Thanks for testing. We will update here with what we find. | ||||||||||||||||||||
| Comment by James Blackburn [ 27/Feb/14 ] | ||||||||||||||||||||
|
I ran Tim's test, using master which includes Hari's change a3d910... and while it's better it's still looks 3-4x worse than 2.4.6: with: 2.4.6, I see:
3709484b65d0c153d4d52713e8466b7b3a791233
| ||||||||||||||||||||
| Comment by Tim Callaghan [ 26/Feb/14 ] | ||||||||||||||||||||
|
I'll rerun my test when RC1 is available. | ||||||||||||||||||||
| Comment by hari.khalsa@10gen.com [ 26/Feb/14 ] | ||||||||||||||||||||
|
Hello! Thank you for this ticket. I identified and fixed a major performance issue identified by this suite. In brief, the server decided to use a collection scan when it should have used an index scan. It made this decision because neither the indexed plan nor the collection scan produced any results while we were comparing the two plans to see which was better. In absence of information about how well the plans did with respect to one another, we chose the collection scan. This is a departure from 2.4's behavior where we wouldn't consider a collection scan if we had an indexed plan available. I submitted a patch to master restore this behavior: https://github.com/mongodb/mongo/commit/a3d910925350d2f0204b41ea145e24f74e5c39ce Feel encouraged to test it out yourself by pulling the latest code or just wait for RC1 which will have this patch. Thanks again! | ||||||||||||||||||||
| Comment by Tim Callaghan [ 26/Feb/14 ] | ||||||||||||||||||||
|
Good, I assume those COLLSCAN lines in my server's log file is an optimizer decision gone bad. As I said, I'm kicking off my full suite of benchmarks and will report anything else I find as an additional ticket. | ||||||||||||||||||||
| Comment by Eric Milkie [ 26/Feb/14 ] | ||||||||||||||||||||
|
Hi Tim, | ||||||||||||||||||||
| Comment by Tim Callaghan [ 26/Feb/14 ] | ||||||||||||||||||||
|
Please let me know if you can reproduce my results. In the meantime I'll run my full benchmark suite comparing 2.4.9 vs. 2.6.0.rc0 to see if there are other regressions. | ||||||||||||||||||||
| Comment by Tim Callaghan [ 26/Feb/14 ] | ||||||||||||||||||||
|
Sure, I just reran the test on my desktop.
MongoDB 2.4.9 MongoDB 2.6.0.rc0 | ||||||||||||||||||||
| Comment by Eric Milkie [ 25/Feb/14 ] | ||||||||||||||||||||
|
Hi Tim, | ||||||||||||||||||||
| Comment by Tim Callaghan [ 25/Feb/14 ] | ||||||||||||||||||||
|
Please let me know if there is any additional data I can provide. |