[SERVER-32452] Replanning may not occur when a plan with an extremely high 'works' value is cached Created: 22/Dec/17  Updated: 30/Oct/23  Resolved: 14/Jun/18

Status: Closed
Project: Core Server
Component/s: Querying
Affects Version/s: 3.6.0
Fix Version/s: 4.1.1

Type: Improvement Priority: Major - P3
Reporter: Andre de Frere Assignee: Ian Boros
Resolution: Fixed Votes: 2
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Documented
Duplicate
is duplicated by SERVER-34785 The Cache Plan cause poor performance Closed
is duplicated by SERVER-33511 Same Query Shape with different regex... Closed
Related
related to SERVER-32539 Suboptimal cached query plan may not ... Backlog
related to SERVER-32943 Query during background index build c... Closed
Backwards Compatibility: Fully Compatible
Sprint: Query 2018-05-21, Query 2018-06-04, Query 2018-06-18
Participants:
Case:

 Description   
Issue Status as of Aug 28, 2018

ISSUE DESCRIPTION AND IMPACT
Replanning may not work when a slow cached plan with a high works value is inferior to a potential plan.

DIAGNOSIS AND AFFECTED VERSIONS
This can be identified by viewing the mongod logs for slow queries that utilize an IXSCAN, with an excessively high number of keysExamined, and comparing the number of keysExamined against the number in the output of the explain(true) command. explain(true) output for the exact same query (with the same predicate values) will have a much smaller number of keysExamined, or may use a different index entirely.

This issue affects MongoDB versions 3.2, 3.4, 3.6 and 4.0.

REMEDIATIONS AND WORKAROUNDS
Users affected by suboptimal query plan selection may:

RESOLUTION DETAILS
The fix for this issue introduces the notion of "inactive" and "active" cache entries. Cache entries are first created in an "inactive" state, and are not used by the planner. They are only used to keep track of the expected number of works that the query will take.

When a plan is run, it is evaluated against the existing inactive entry:

  • If the plan's works value is less than or equal to the inactive cache entry's works value, the new plan is put in an "active" entry, and the works value is updated.
  • If the new plan's works value is greater than the inactive entry's value, the new plan is not cached. Instead, the works value of the inactive entry is multiplied by two (by default). The multiplier can be adjusted via the InternalQueryCacheWorksGrowthCoefficient server parameter.

FIX VERSIONS
This issue is fixed in MongoDB 4.1.1, and will be available in the MongoDB 4.2 production release.

Original description

This will help give more control over query plans that get selected suboptimally.

Could be a tunable parameter for the number of works.



 Comments   
Comment by David Storch [ 12/Oct/18 ]

Hi jose.morales,

Unfortunately, due to the complexity of this change, it cannot be backported to 4.0, 3.6, or earlier versions. Users who wish to take advantage of this fix should upgrade to stable version 4.2 when it becomes available. Thanks for your interest in MongoDB, and let me know if you have any further questions!

Best,
Dave

Comment by Jose MORALES ARAGON [ 12/Oct/18 ]

Is in the roadmap to backport this fix to mongo 3.6.x ?

Comment by Githook User [ 14/Jun/18 ]

Author:

{'name': 'Ian Boros', 'email': 'ian.boros@10gen.com'}

Message: SERVER-32452 add ability for plan cache entries to be inactive
Branch: master
https://github.com/mongodb/mongo/commit/2a66542157d5eac9ad43c749f3531bd57097cd26

Comment by David Storch [ 03/May/18 ]

ian.boros, let's use this ticket to implement the "tombstone" policy described by the design for this project. This new policy should be flag-gated behind internalQueryCacheDisableTombstones and off by default in 4.0. This ticket tracks correctness tests for the tombstone policy. Performance tests will be implemented as a separate work item.

CC charlie.swanson

Comment by Chris Harris [ 29/Jan/18 ]

david.storch, indeed this ticket is related to plans getting "stuck" in the cache. After additional consideration this seems closely related to SERVER-19835 ("SubplanStage should not cache plan for an individual $or branch if plans tie or there are no query results"). That ticket description notes:

2. the winning plan does not produce any results during the MultiPlanStage trial period.

The intention behind this behavior is that in these edge cases the system has low confidence that it has selected the best plan. In order to avoid re-using the wrong plan from the cache for a subsequent execution of the query shape, the system refrains from creating a potentially erroneous cache entry.

In version 3.1.2 under SERVER-15225, behavior was introduced to aggressively evict bad cache entries. This aggressive eviction obviated the need for not caching in case of ties or zero results, and therefore the system changed to always create a cache entry.

Based on further thought and testing, I can't convince myself that the statement that the "aggressive eviction obviated the need for not caching in case of ... zero results" is true.

Consider this in conjunction with SERVER-31078 ("Query planning is very slow during multiplanning when result set is empty"). The summary there is:

The MultiPlanStage picks the best plan by "working all the plans, stopping when a plan hits EOF or returns some fixed number of results". But if the query returns no results, this stopping condition can take a very long time and we've essentially done a full index scan before returning.

My understanding of re-planning (referred to as "aggressive cache eviction" above I believe) is that the threshold will be set at 10x the number of works required when the plan was cached per the internalQueryCacheEvictionRatio parameter. In the case where we do a full, or at least significant, scan of the index, the works threshold will be sufficiently high that it can never be reached.

This can happen, for example, if there is a predicate that is not one of the index keys and doesn't match any results. Consider the following schema/indexing:

 c.findOne()
{
	"_id" : ObjectId("5a6b4918cec428faaa27585a"),
	"location" : "NYC",
	"sqft" : 4731,
	"price" : 165858,
	"baths" : 6
}
> c.getIndexes()
[
	{
		"v" : 2,
		"key" : {
			"_id" : 1
		},
		"name" : "_id_",
		"ns" : "SERVER-32452.demo"
	},
	{
		"v" : 2,
		"key" : {
			"location" : 1,
			"sqft" : 1
		},
		"name" : "location_1_sqft_1",
		"ns" : "SERVER-32452.demo"
	},
	{
		"v" : 2,
		"key" : {
			"location" : 1,
			"price" : 1
		},
		"name" : "location_1_price_1",
		"ns" : "SERVER-32452.demo"
	}
]

And let's say that the query shape in question is:

{location:'NYC', sqft: {$gte:0, $lte:10000}, price: {$gte:0, $lte:120000}, baths: {$gte:0, $lte:10}}

Going back and forth between a user who is selective about square footage an a user that is selective about price allows the database to re-plan appropriately:

2018-01-29T10:48:04.549-0600 I COMMAND  [conn4] command SERVER-32452.demo appName: "MongoDB Shell" command: find { find: "demo", filter: { location: "NYC", sqft: { $gte: 0.0, $lte: 10000.0 }, price: { $gte: 0.0, $lte: 120000.0 }, baths: { $gte: 0.0, $lte: 10.0 } } } planSummary: IXSCAN { location: 1, price: 1 } cursorid:64815161403 keysExamined:101 docsExamined:101 fromMultiPlanner:1 numYields:2 nreturned:101 reslen:8969 locks:{ Global: { acquireCount: { r: 6 } }, Database: { acquireCount: { r: 3 } }, Collection: { acquireCount: { r: 3 } } } protocol:op_command 5ms
2018-01-29T10:48:16.249-0600 I COMMAND  [conn4] command SERVER-32452.demo appName: "MongoDB Shell" command: find { find: "demo", filter: { location: "NYC", sqft: { $gte: 9000.0, $lte: 10000.0 }, price: { $gte: 0.0, $lte: 1000000.0 }, baths: { $gte: 0.0, $lte: 10.0 } } } planSummary: IXSCAN { location: 1, sqft: 1 } cursorid:65262064703 keysExamined:101 docsExamined:101 fromMultiPlanner:1 replanned:1 numYields:12 nreturned:101 reslen:8969 locks:{ Global: { acquireCount: { r: 26 } }, Database: { acquireCount: { r: 13 } }, Collection: { acquireCount: { r: 13 } } } protocol:op_command 39ms
2018-01-29T10:48:27.209-0600 I COMMAND  [conn4] command SERVER-32452.demo appName: "MongoDB Shell" command: find { find: "demo", filter: { location: "NYC", sqft: { $gte: 0.0, $lte: 10000.0 }, price: { $gte: 0.0, $lte: 120000.0 }, baths: { $gte: 0.0, $lte: 10.0 } } } planSummary: IXSCAN { location: 1, price: 1 } cursorid:65543908337 keysExamined:101 docsExamined:101 fromMultiPlanner:1 replanned:1 numYields:10 nreturned:101 reslen:8969 locks:{ Global: { acquireCount: { r: 22 } }, Database: { acquireCount: { r: 11 } }, Collection: { acquireCount: { r: 11 } } } protocol:op_command 7ms

Now let's say someone comes along and would like a large place with a lot of bathrooms. That filter on the non-indexed field is going to result in examining (and rejecting) plenty of documents. This will trigger re-planning and then end up caching the new work metrics.

2018-01-29T11:10:25.320-0600 I COMMAND  [conn4] command SERVER-32452.demo appName: "MongoDB Shell" command: find { find: "demo", filter: { location: "NYC", sqft: { $gte: 8000.0, $lte: 10000.0 }, price: { $gte: 0.0, $lte: 1000000.0 }, baths: { $gte: 8.0, $lte: 10.0 } } } planSummary: IXSCAN { location: 1, sqft: 1 } keysExamined:3022 docsExamined:3022 fromMultiPlanner:1 replanned:1 cursorExhausted:1 numYields:55 nreturned:0 reslen:90 locks:{ Global: { acquireCount: { r: 112 } }, Database: { acquireCount: { r: 56 } }, Collection: { acquireCount: { r: 56 } } } protocol:op_command 20ms

Now our re-planning threshold is set to ~30k documents, so our previous query predicates (smaller range on price) are now doing lots of extra work:

2018-01-29T11:11:53.234-0600 I COMMAND  [conn4] command SERVER-32452.demo appName: "MongoDB Shell" command: find { find: "demo", filter: { location: "NYC", sqft: { $gte: 0.0, $lte: 10000.0 }, price: { $gte: 0.0, $lte: 120000.0 }, baths: { $gte: 0.0, $lte: 10.0 } } } planSummary: IXSCAN { location: 1, sqft: 1 } cursorid:67256109411 keysExamined:4460 docsExamined:4460 numYields:35 nreturned:101 reslen:8969 locks:{ Global: { acquireCount: { r: 72 } }, Database: { acquireCount: { r: 36 } }, Collection: { acquireCount: { r: 36 } } } protocol:op_command 12ms

Depending on the data, it is still possible to trigger the re-planning mechanism at this point, but the price predicate has to be pretty narrow. Eg something along the lines of:

2018-01-29T11:20:39.902-0600 I COMMAND  [conn4] command SERVER-32452.demo appName: "MongoDB Shell" command: find { find: "demo", filter: { location: "NYC", sqft: { $gte: 0.0, $lte: 10000.0 }, price: { $gte: 0.0, $lte: 101000.0 }, baths: { $gte: 0.0, $lte: 10.0 } } } planSummary: IXSCAN { location: 1, price: 1 } cursorid:68018479453 keysExamined:101 docsExamined:101 fromMultiPlanner:1 replanned:1 numYields:239 nreturned:101 reslen:8969 locks:{ Global: { acquireCount: { r: 480 } }, Database: { acquireCount: { r: 240 } }, Collection: { acquireCount: { r: 240 } } } protocol:op_command 44ms

However, if a user were to come in and ONLY care about the non-indexed field (baths in this case), then we can do a large scan that would set the threshold much higher than could ever be achieved. Indeed, if this were a full index scan (such as could happen with SERVER-31078) then the threshold would be 10x the number of documents in the collection.

2018-01-29T11:21:46.417-0600 I COMMAND  [conn4] command SERVER-32452.demo appName: "MongoDB Shell" command: find { find: "demo", filter: { location: "NYC", sqft: { $gte: 0.0, $lte: 10000.0 }, price: { $gte: 0.0, $lte: 1000000.0 }, baths: { $gte: 8.0, $lte: 10.0 } } } planSummary: IXSCAN { location: 1, price: 1 } keysExamined:100001 docsExamined:100001 fromMultiPlanner:1 replanned:1 cursorExhausted:1 numYields:1033 nreturned:0 reslen:90 locks:{ Global: { acquireCount: { r: 2068 } }, Database: { acquireCount: { r: 1034 } }, Collection: { acquireCount: { r: 1034 } } } protocol:op_command 262ms

I've also linked an interesting variation of a potential cause related to background index builds in SERVER-32943.

Hopefully this helps, but please let me know if I can provide further clarification.

Comment by David Storch [ 28/Dec/17 ]

andre.defrere, is this suggested change motivated by some observed bad behavior? I'm disinclined to mess with our plan selection logic unless we have empirical evidence that it improves plan selection, without causing plan selection to regress for certain workloads.

Do I infer correctly that this is a suggested fix for bad plans getting stuck in the plan cache?

Generated at Thu Feb 08 04:30:17 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.