[SERVER-14325] Severe Performance Degradation in Mongo 2.6.2 Created: 20/Jun/14 Updated: 24/Jan/15 Resolved: 23/Jan/15 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Performance, Querying |
| Affects Version/s: | 2.6.1, 2.6.2 |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Matt Kolbert | Assignee: | Unassigned |
| Resolution: | Incomplete | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Participants: |
| Description |
|
I have a sharded cluster with 12 nodes. This is a test instance so no replica sets. I am querying a collection that is sharded across the 12 servers using a shard key of { "_id" : "hashed" }. I also have an index on _id as well. My collection has 124 million documents. I am submitting the following script via mongo shell:
I ran this using version 2.4.8 and it was iterating at an average of around 200 million per hour. This was using a system that was just started with no warming up. I then upgraded to version 2.6.2 and ran the same script. This time the average was less than 3 million per hour. Some additional information. First - using 2.6.2 when I run myCursor.explain() to get the plan each one of the 12 shards returns:
For version 2.4.8 this is true. When I connect directly to one of the shards running 2.6.2 and run:
The result is the same except "indexOnly" is true. Additionally when O monitor the operations that are running on the mongos where I submitted the script I see the following:
I only ever see one of these operations running at a time. They run for each shard sequentially: shard0000, shard0001, shard0002... shard0011 I would expect that the mongos would submit all of these queries in parallel. |
| Comments |
| Comment by Thomas Rueckstiess [ 08/Jul/14 ] |
|
Hi Matt, One representative log from each type of node would be sufficient for now, that is
The config server log is probably not very useful. If you have those covering both a 2.4.x and a 2.6.x. test that would be extremely helpful. If you are going to repeat the test, setting the log level to 2 (very high verbosity) would be even better. This would allow us to see page fault exceptions in the log. Alternatively, if you have iostats and mongostats during the tests, those would be helpful as well. I also recommend disabling the balancer as it can have side-effects and a significant impact during the test. Thanks, |
| Comment by Matt Kolbert [ 03/Jul/14 ] |
|
Hi Thomas, 1 . The test was run on the same set of 12 servers. The only difference was that I upgraded in between runs. I believe that the balancer was on during the test. |
| Comment by Thomas Rueckstiess [ 02/Jul/14 ] |
|
Hi Matt, It turns out that the indexOnly issue is a red herring. 2.4 always does a fetch before returning a document on a sharded system in order to check if the chunk owns the document ( So far, we were unable to reproduce the performance difference you're seeing. A few more questions to further diagnose this:
Thanks, |
| Comment by Ramon Fernandez Marina [ 20/Jun/14 ] |
|
mkolbert@copyright.com, thanks for your report, we can reproduce the behavior you describe with the indexes and we're investigating. |