[SERVER-18777] CachedPlanStage replanning mechanism does not apply to rooted $or queries Created: 01/Jun/15  Updated: 07/Apr/23

Status: Backlog
Project: Core Server
Component/s: Querying
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: David Storch Assignee: Backlog - Query Optimization
Resolution: Unresolved Votes: 4
Labels: bonsai, storch
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to SERVER-19835 SubplanStage should not cache plan fo... Closed
is related to SERVER-13732 Predicates in top-level implicit AND ... Closed
is related to SERVER-15225 CachedPlanStage should execute for tr... Closed
Assigned Teams:
Query Optimization
Sprint: Query 2017-10-23, Query 2017-11-13, Query 2017-12-04, Query 2017-12-18, Query 2018-01-01, Query 2018-01-15
Participants:
Case:

 Description   

Rooted $or queries run through the SubplanStage path. The SubplanStage may use a cached plan for a branch of the $or without making use of the CachedPlanStage trial period logic. This means that rooted $or queries could use badly performing plans without plan cache entry eviction and replanning kicking in.



 Comments   
Comment by Justin Seyster [ 18/Jan/18 ]

I created a branch with the most recent POC.

Comment by Ian Whalen (Inactive) [ 12/Jan/18 ]

Dave to talk to Justin about pausing work on this for now and picking up the POC work done so far later in this release.

Comment by David Storch [ 29/Sep/17 ]

Assigning to justin.seyster tentatively to explore possible fixes and the engineering complexity associated with any potential fix.

Comment by J Rassi [ 07/Aug/15 ]

To correct the zero-result/tying plans regression for the subplanner introduced in SERVER-15225, we will implement SERVER-19835 as an interim fix. Bumping this ticket out of "3.1 Required" accordingly.

The outstanding problem remains that subplanned queries are not able to make use of replanning. Possible solutions:

  1. implement a per-branch trial period in the subplanner, and always throw out the results from the trial period (note: causes the initial work for each branch to unnecessarily be performed twice).
  2. change the OR stage to "round-robin" on its children, revert the subplanner back to whole-query caching instead of per-branch caching, and replan from scratch if the trial period on the whole "franken-plan" does not perform well.
Comment by David Storch [ 15/Jul/15 ]

Hi evan@dnanexus.com,

Sorry to hear that mongod is using a slow plan. Since the query

db.data.find({id: $in: ["object-1234", "object-2345"]}).sort({modified: 1, id: 1})

does not have an $or, the problem you are experiencing is unrelated to this ticket. (The MongoDB query engine currently treats $or and $in differently; see SERVER-12024.)

After reviewing the information you provided, I would suggest looking at SERVER-15225. Under this ticket, we added functionality which can prevent bad plan selections like this one. This functionality is first available in version 3.0.4 and is enabled by the internalQueryCacheReplanningEnabled flag. Please refer to SERVER-15225 for more details.

Also, keep in mind that .hint() and index filters are always available as workarounds if the server is choosing a slow plan.

Best,
Dave

Comment by Evan Worley [ 15/Jul/15 ]

Hi Dan, I'm happy to help explain what we are seeing. I've left out many details that I don't think are relevant, but let me know if you'd like to know more.

Collection: "data"
Number of documents: 15,000,000

Every data object has an "id" and a "modified" field.

Indexes

  • {id: 1}
  • {modified: 1, id: 1}

Now take a query like

db.data.find({id: $in: ["object-1234", "object-2345"]}).sort({modified: 1, id: 1})

Most of the time is takes 100ms, using the id index. For some reason, every once in a while, it uses the modified_1_id_1 index and takes 30 minutes. We have no idea why this happens.

I realize my query is not a rooted $or, but it seems related to poor query plan caching..

Comment by Daniel Pasette (Inactive) [ 15/Jul/15 ]

Hi Evan, Can you lay out your example use case which is hitting this particular issue in more detail?

Comment by Evan Worley [ 14/Jul/15 ]

Is there any workaround for this issue? Ever since upgrading to Mongo 2.6.9 we randomly see rooted $or queries which use an index which performs terribly. If the right index is used, the query takes 100ms. If the wrong index is used, it takes 30 minutes...

Generated at Thu Feb 08 03:48:43 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.