[SERVER-13740] Improve plan ranking for index intersection Created: 25/Apr/14  Updated: 06/Dec/22

Status: Backlog
Project: Core Server
Component/s: Querying
Affects Version/s: 2.6.0
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: David Storch Assignee: Backlog - Query Optimization
Resolution: Unresolved Votes: 11
Labels: bonsai
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
is duplicated by SERVER-15125 Index intersection amount of work sho... Closed
is duplicated by SERVER-17821 Query planner prefers suboptimal inde... Closed
is duplicated by SERVER-28587 MongoDB aggregation query slowness Closed
Related
is related to SERVER-20619 Statistics-based query optimization Backlog
Assigned Teams:
Query Optimization
Participants:
Case:

 Description   

We know from performance testing that index intersection plans are faster than single-solution plans when

  1. The size of the intersection is small.
  2. The amount of data that would need to be fetched by the single-index solutions is large, so that
  3. this data would exceed the available memory, or is generally not in the working set.

However, our testing also shows that index intersection plans may not be chosen by the ranker in these cases. We should consider tweaking plan ranking such that the ranker is more likely to choose intersection plans.



 Comments   
Comment by Patrick Dalton [ 19/Nov/20 ]

I'd like to add that my company has recently run into this issue.  I'd like to request a change in the documentation.  https://docs.mongodb.com/manual/core/index-intersection/ 

"MongoDB can use the intersection of multiple indexes to fulfill queries. In general, each index intersection involves two indexes; however, MongoDB can employ multiple/nested index intersections to resolve a query."

is very misleading.  The query planner is not able to make use of multiple indexes in the majority of our use cases, which is pretty disappointing given the documentation.  The ability to hint at index intersection would be a good solution for us.

Comment by Vyacheslav Stroy [ 06/Apr/20 ]

In case of large collections with billions of documents indexes intersection may potentially provide a predictable performance, which is far more important than possible minor performance gains from the IXSCAN + filtered FETCH plan.

We've built a dozen of projects on top of MongoDB during the past decade, and this query planner issue is a huge pain for us. It's like a nightmare that keeps haunting us for years. A project is ready, deployed on production, everything works flawlessly and everybody is immensely happy with the performance. But two months later the database grows to a several millions of documents and some queries start slowing down. We build compound indexes on 4-6 fields that participate in query, set hints in code and forget about it for a few more months. Nevertheless, sooner or later the compound index grows to hundreds of gigabytes and it doesn't fit in RAM anymore. Simple intersection of 3 or 4 small indexes would do the trick, but there is no way to specify the hint or adjust the query. And we have to migrate everything to another db engine even despite the fact that 98% of all queries will be much slower than before, but usually it's those 2% left that really matter, because they may easily result in the completely degraded performance for the entire application. The situation repeats again and again. We had to migrate 3 of our large projects so far, and it seems like we'll have to do the same with two more applications in the nearest future. There was even a situation when we moved a legacy system from Oracle to MongoDB just to migrate it back again three years later when the compound indexes performance plunged due to their enormous size.

That's especially painful to me, as Mongo was my favorite db engine for years. However, having predictable performance for all queries allows us to plan future scalability, so naturally, it's the paramount consideration. MongoDB works flawlessly with relatively small datasets or simple queries, so it's definitely our primary choice for prototyping and small projects.

But I really hope that you prioritize this problem and include the fix in the upcoming major release, because right now we simply can't achieve reasonable query processing time in two of our ongoing projects. Judging from the numerous questions StackOverflow, many devs have the same issue. Please, consider prioritizing this or at least allow hinting the intersection manually (SERVER-22741) - this will do the trick for the most severe cases.

As a side note, index intersection plan doesn't even appear in the rejected plans in the following cases:

  • one of the filter conditions is not an equality operator {a: 1, b: \{$gt: 1}}
  • one of the indexes is a multikey index {intField: 1, arrayField: 1}
  • with more than 3 conditions {a: 1, b: 1, c: 1, d: 1}

(checked on 4.2.1)

Comment by William Richards [ 17/Jun/15 ]

This issue is especially bad when using the WiredTiger storage engine, as the scan over the single-index solution can require decompression of very large amounts of data (not just a quick scan over small parts of memory mapped files). The script attached to SERVER-17821 replicates this issue. This issue is preventing us from migrating to WiredTiger due to a ~1000x performance loss.

Generated at Thu Feb 08 03:32:43 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.