[SERVER-21064] It's possible to fail on sort memory overflow even if there is an alternative plan that requires no sort. Created: 22/Oct/15  Updated: 23/Oct/15  Resolved: 23/Oct/15

Status: Closed
Project: Core Server
Component/s: Querying
Affects Version/s: 3.0.3
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Dmitry Ryabtsev Assignee: David Storch
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: HTML File template    
Issue Links:
Duplicate
duplicates SERVER-15225 CachedPlanStage should execute for tr... Closed
Related
is related to SERVER-21065 Optimizer should consider predicate w... Closed
Sprint: QuInt B (11/02/15)
Participants:

 Description   

With a collection that has two indexes like:
1) a_1_b_1_c_1
2) c_1_b_1_a_1

If you run a query like: {a: { "$in": [1]}, b:1, c: { "$in": [1,2,3]}}).sort({c:1}), a query shape will be cached with the associated plans that do no require sort and it is likely that the plan using the a_1_b_1_c_1 will get the top score.

If you now run query like: {a: { "$in": [1,2,3]}, b:1, c: { "$in": [1,2,3]}}).sort({c:1}), the cached plan with top score will be used. As the query requires SORT stage, the plan will be re-adjusted, however if at the sort stage we hit the 32Mb limit, the query will fail despite the fact that there is an alternative plan cached that uses the c_1_b_1_a_1 index, which if used, will require no sort and re-adjustment.



 Comments   
Comment by David Storch [ 23/Oct/15 ]

After reproducing the issue, I can confirm that this is a duplicate of SERVER-15225. Please read on for a more detailed description of the issue.

Prior to SERVER-15225, the CachedPlanStage had a notion of a "backup plan". Specifically, whenever a plan with a blocking in-memory SORT stage was added to the plan cache, we would look for a non-blocking plan to act as "backup" (see the code here). If the plan is taken from the cache on a subsequent execution of the query shape, and the cached plan fails by hitting the 32 MB limit for in-memory sorts, the CachedPlanStage would fall back on the backup plan.

This ticket describes a scenario in which the cached plan backup plan strategy did not work. Suppose you have index {a: 1, b: 1, c: 1}. Consider plans which answer the following two predicates by scanning this index:

{a: { "$in": [1]}, b:1, c: { "$in": [1,2,3]}}
{a: { "$in": [1,2,3]}, b:1, c: { "$in": [1,2,3]}}

The former scan will be sorted by {c: 1} since it will return documents where c is 1, followed by docs where c is 2, followed by docs where c is 3. On the other hand, the latter scan will not be sorted by {c: 1}. However, as reported in SERVER-21065, these two predicates are considered to have the same shape. (The number of $in elements is not considered significant when computing query shape.) This leads to the following situation:

  • The client issues a query with predicate {a: { "$in": [1]}, b:1, c: { "$in": [1,2,3]}} and sort {c: 1}.
  • The non-blocking plan which obtains the sort order by scanning index {a: 1, b: 1, c: 1} is selected as the winner and cached. Since the plan has no SORT stage, no backup plan is incorporated into the plan cache entry.
  • The client issues a query with predicate {a: { "$in": [1,2,3]}, b:1, c: { "$in": [1,2,3]}} and sort {c: 1}.
  • This query is considered the same shape as the prior one, so the query will used the cached plan. The plan building process correctly identifies that the index scan is not providing the sort on {c: 1}, and adds a SORT stage.
  • The in-memory sort hits the 32 MB limit. However, the cache entry has no backup plan, so rather than attempting to fall back on a non-blocking plan, an error is returned to the application.

This is fixed by SERVER-15225, which eliminated the notion of a cached "backup plan" and replaced it with the notion of "replanning". When this scenario occurs on a version where replanning is enabled by default (3.0.7 and higher), the cached plan hitting the 32 MB limit will cause us to restart plan selection from the beginning. We generate and rank several candidate plans, and correctly select the non-blocking plan as the winner.

Comment by David Storch [ 22/Oct/15 ]

dmitry.ryabtsev, do have a repro script you could share? I tried to write a script based on your description but wasn't able to reproduce. Your repro would help me confirm that this issue was fixed by SERVER-15225.

Comment by Dmitry Ryabtsev [ 22/Oct/15 ]

On 3.0.7 I'm able to reproduce this only after "db.runCommand({setParameter: 1, internalQueryCacheReplanningEnabled: false})". My understanding is that we re-adjust the cached plan having the top score by adding sort stage. Then we score it and compare the score against the second plan we have which has the optimal index. I'm concerned that this is a workaround but not the fix, as the winning plan can be the plan with sort.

Generated at Thu Feb 08 03:56:11 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.