[SERVER-14525] Perf regression in 2.6.2 caused by not caching plans that tie during plan ranking Created: 10/Jul/14  Updated: 25/Jun/15  Resolved: 14/Apr/15

Status: Closed
Project: Core Server
Component/s: Querying
Affects Version/s: 2.6.3
Fix Version/s: None

Type: Bug Priority: Critical - P2
Reporter: Alvin Richards (Inactive) Assignee: David Storch
Resolution: Duplicate Votes: 12
Labels: 28qa
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Duplicate
duplicates SERVER-15225 CachedPlanStage should execute for tr... Closed
Related
related to SERVER-15152 When evaluating plans, some index can... Closed
is related to SERVER-15237 Performance issue: Query Plan Cache i... Closed
Operating System: ALL
Steps To Reproduce:

var tests=[]
load ('./mongo-perf/util/utils.js')
 
tests.push( { name: "Queries.TwoInts",
  pre: function( collection ) {
         collection.drop();
         for ( var i = 0; i < 1000; i++ ) {
           collection.insert( { x: i, y: 2*i } );
         }
         collection.ensureIndex({x: 1});
         collection.ensureIndex({y: 1});
       },
  ops : [
    { op: "find",
      query: { x: { "#SEQ_INT": { seq_id: 0, start: 0, step: 1, mod: 1000 } },
               y: { "#SEQ_INT": { seq_id: 1, start: 0, step: 2, mod: 2000 } } }
    }
        ]
} );
 
runTests([1, 2, 4, 8, 16, 32], 1,'bisect', 'localhost', '27017');

Participants:

 Description   

For scenarios in which plans repeatedly tie, SERVER-13675 introduces a performance regression. Ties can actually be quite common for some indexing schemes. For example, you might have indices {a: 1, b: 1} and {a: 1, c: 1} to support two different query patterns. When you query with a predicate over 'a' and without predicates over either 'b' or 'c', however, the two indices will always tie. Not caching one of the two tied plans means that we do extra work by going through the MultiPlanRunner path for every query of this shape.

Original Description

Problem:
Using mongo-perf, the Queries.TwoInts regresses in throughput from 2.6.1 to 2.6.2
Good 2.6.1

@@@START@@@
Queries.TwoInts
        1       15183.6
        2       31095.600000000002
        4       49195.200000000004
        8       84476.8
        16      105745.2
        32      110057.8
@@@END@@@
{
        "[object Object]" : {
                "1" : {
                        "ops_per_sec" : 15183.6
                },
                "2" : {
                        "ops_per_sec" : 31095.600000000002
                },
                "4" : {
                        "ops_per_sec" : 49195.200000000004
                },
                "8" : {
                        "ops_per_sec" : 84476.8
                },
                "16" : {
                        "ops_per_sec" : 105745.2
                },
                "32" : {
                        "ops_per_sec" : 110057.8
                }
        }
}

Bad Githash: 213700b3af4d53ce7e808dce2c638d98fc4f91db

@@@START@@@
Queries.TwoInts
        1       9583.400000000001
        2       19006.199999999997
        4       33435
        8       53750.399999999994
        16      65717.4
        32      67228.8
@@@END@@@
{
        "[object Object]" : {
                "1" : {
                        "ops_per_sec" : 9583.400000000001
                },
                "2" : {
                        "ops_per_sec" : 19006.199999999997
                },
                "4" : {
                        "ops_per_sec" : 33435
                },
                "8" : {
                        "ops_per_sec" : 53750.399999999994
                },
                "16" : {
                        "ops_per_sec" : 65717.4
                },
                "32" : {
                        "ops_per_sec" : 67228.8
                }
        }
}



 Comments   
Comment by David Storch [ 14/Apr/15 ]

Hi sranck@listrak.com,

Apologies for the delay in our response. I believe your understanding of the issue is correct. Please let me know if you have any further questions.

Best,
Dave

Comment by David Storch [ 14/Apr/15 ]

As of 81a1f70b87b3f3754 (see SERVER-15225, fixed for development release 3.1.2), we cache a query plan even in the case of plan ranking ties. This eliminates the logic which was the root cause of this performance regression. Closing as a duplicate of SERVER-15225.

Comment by Steve Ranck [ 19/Mar/15 ]

I'd like to confirm that I understand this issue correctly. We make extensive use of indexes in the form of:

 
{ a: 1, b: 1 }
{ a: 1, c: 1 }
{ a: 1, d: 1 }
{ a: 1, e: 1 }
{ a: 1, f: 1 }

According to this issue, is it correct that the following query will never cache a plan given the indexes above?

 
db.Collection.find({ a: 1, g: 1 })

Comment by Asya Kamsky [ 17/Oct/14 ]

flavio@alicubi.net can you share how you are testing and measuring the impact? Do you have a synthetically generated test you can just attach to the ticket or are you using a copy of your actual data/operations to measure load? Also, what metrics are you collecting in the tests - can you post what exact numbers are like?

Comment by flavio alberti [ 17/Oct/14 ]

I retested in mongodb 2.6.5, but SERVER-15152 didn't reduce the impact of this issue.

Comment by Asya Kamsky [ 19/Sep/14 ]

Fixing SERVER-15152 might reduce the impact/significance of this -

because (a) plans should not tie as often (b) cost of re-racing will be lower.

alvin can you retest this on 2.6.5 candidate when that fix goes in there?

Comment by flavio alberti [ 13/Sep/14 ]

I hope this regression can be fixed asap in 2.6, otherwise we can not upgrade our cluster in production. We can not apply any workaround (hint operator at the end of the queries) whiteout changing a lot code and queries.

Comment by rgpublic [ 12/Aug/14 ]

(Is this the same as SERVER-14423 BTW?)

Perhaps you might want to reconsider the importance of this. As you are probably well aware of there have been numerous similar problems introduced with the new query optimizer supporting index intersection. From a mere user POV it is currently a rather huge regression to the 2.4.x versions IMHO. We need to skip using the 2.6.x versions altogether due to all these problems and hope the dust settles with the 2.8.x series. As soon as we try to update the whole database grinds to a halt because many queries run way longer than under 2.4.x. The main problem here is the fact that it's a regression. We have an existing code-base that works great and fast under 2.4.x. We want to update to make use of the new features in 2.6.x. We can't until all queries run at least in similar time as before.

MongoDB's great feature to support ad-hoc queries means that we have spread lots of different queries all over our code base and it's just not practical to look for queries that worked great before and add hints everywhere just to explain MongoDB that it should use an a_b-index if we're querying on, well, a and b. Just my $0,02. Appreciate your hard work.

Comment by Alvin Richards (Inactive) [ 10/Jul/14 ]

Query plans look the same on 2.6.1 and 2.6.3

> db.Queries.TwoInts.find( { x: 375, y: 750 } )
{ "_id" : ObjectId("53bef88d894bc32cc1356586"), "x" : 375, "y" : 750 }
> db.Queries.TwoInts.find( { x: 375, y: 750 } ).explain()
{
	"cursor" : "BtreeCursor x_1",
	"isMultiKey" : false,
	"n" : 1,
	"nscannedObjects" : 1,
	"nscanned" : 1,
	"nscannedObjectsAllPlans" : 2,
	"nscannedAllPlans" : 4,
	"scanAndOrder" : false,
	"indexOnly" : false,
	"nYields" : 0,
	"nChunkSkips" : 0,
	"millis" : 0,
	"indexBounds" : {
		"x" : [
			[
				375,
				375
			]
		]
	},
	"server" : "bismark:27017",
	"filterSet" : false
}

Generated at Thu Feb 08 03:35:08 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.