[SERVER-32452] Replanning may not occur when a plan with an extremely high 'works' value is cached Created: 22/Dec/17 Updated: 30/Oct/23 Resolved: 14/Jun/18 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Querying |
| Affects Version/s: | 3.6.0 |
| Fix Version/s: | 4.1.1 |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Andre de Frere | Assignee: | Ian Boros |
| Resolution: | Fixed | Votes: | 2 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||||||||||||||
| Sprint: | Query 2018-05-21, Query 2018-06-04, Query 2018-06-18 | ||||||||||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||||||||||
| Case: | (copied to CRM) | ||||||||||||||||||||||||||||||||
| Description |
|
Issue Status as of Aug 28, 2018 ISSUE DESCRIPTION AND IMPACT DIAGNOSIS AND AFFECTED VERSIONS This issue affects MongoDB versions 3.2, 3.4, 3.6 and 4.0. REMEDIATIONS AND WORKAROUNDS
RESOLUTION DETAILS When a plan is run, it is evaluated against the existing inactive entry:
FIX VERSIONS Original descriptionThis will help give more control over query plans that get selected suboptimally. Could be a tunable parameter for the number of works. |
| Comments |
| Comment by David Storch [ 12/Oct/18 ] | |||||||||||||||||||||||||||||||||||||||||||||
|
Hi jose.morales, Unfortunately, due to the complexity of this change, it cannot be backported to 4.0, 3.6, or earlier versions. Users who wish to take advantage of this fix should upgrade to stable version 4.2 when it becomes available. Thanks for your interest in MongoDB, and let me know if you have any further questions! Best, | |||||||||||||||||||||||||||||||||||||||||||||
| Comment by Jose MORALES ARAGON [ 12/Oct/18 ] | |||||||||||||||||||||||||||||||||||||||||||||
|
Is in the roadmap to backport this fix to mongo 3.6.x ? | |||||||||||||||||||||||||||||||||||||||||||||
| Comment by Githook User [ 14/Jun/18 ] | |||||||||||||||||||||||||||||||||||||||||||||
|
Author: {'name': 'Ian Boros', 'email': 'ian.boros@10gen.com'}Message: | |||||||||||||||||||||||||||||||||||||||||||||
| Comment by David Storch [ 03/May/18 ] | |||||||||||||||||||||||||||||||||||||||||||||
|
ian.boros, let's use this ticket to implement the "tombstone" policy described by the design for this project. This new policy should be flag-gated behind internalQueryCacheDisableTombstones and off by default in 4.0. This ticket tracks correctness tests for the tombstone policy. Performance tests will be implemented as a separate work item. | |||||||||||||||||||||||||||||||||||||||||||||
| Comment by Chris Harris [ 29/Jan/18 ] | |||||||||||||||||||||||||||||||||||||||||||||
|
david.storch, indeed this ticket is related to plans getting "stuck" in the cache. After additional consideration this seems closely related to
Based on further thought and testing, I can't convince myself that the statement that the "aggressive eviction obviated the need for not caching in case of ... zero results" is true. Consider this in conjunction with SERVER-31078 ("Query planning is very slow during multiplanning when result set is empty"). The summary there is:
My understanding of re-planning (referred to as "aggressive cache eviction" above I believe) is that the threshold will be set at 10x the number of works required when the plan was cached per the internalQueryCacheEvictionRatio parameter. In the case where we do a full, or at least significant, scan of the index, the works threshold will be sufficiently high that it can never be reached. This can happen, for example, if there is a predicate that is not one of the index keys and doesn't match any results. Consider the following schema/indexing:
And let's say that the query shape in question is:
Going back and forth between a user who is selective about square footage an a user that is selective about price allows the database to re-plan appropriately:
Now let's say someone comes along and would like a large place with a lot of bathrooms. That filter on the non-indexed field is going to result in examining (and rejecting) plenty of documents. This will trigger re-planning and then end up caching the new work metrics.
Now our re-planning threshold is set to ~30k documents, so our previous query predicates (smaller range on price) are now doing lots of extra work:
Depending on the data, it is still possible to trigger the re-planning mechanism at this point, but the price predicate has to be pretty narrow. Eg something along the lines of:
However, if a user were to come in and ONLY care about the non-indexed field (baths in this case), then we can do a large scan that would set the threshold much higher than could ever be achieved. Indeed, if this were a full index scan (such as could happen with SERVER-31078) then the threshold would be 10x the number of documents in the collection.
I've also linked an interesting variation of a potential cause related to background index builds in Hopefully this helps, but please let me know if I can provide further clarification. | |||||||||||||||||||||||||||||||||||||||||||||
| Comment by David Storch [ 28/Dec/17 ] | |||||||||||||||||||||||||||||||||||||||||||||
|
andre.defrere, is this suggested change motivated by some observed bad behavior? I'm disinclined to mess with our plan selection logic unless we have empirical evidence that it improves plan selection, without causing plan selection to regress for certain workloads. Do I infer correctly that this is a suggested fix for bad plans getting stuck in the plan cache? |