-
Type: Improvement
-
Resolution: Unresolved
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
Query Optimization
This is a bit convoluted, but please bear with me:
1. if there is a partial index filter expression in the form of an $in list or equivalent $or
2. and a complex query that is not compatible with the index
then, the execution of the query is delayed by O(N*M), where N is the number of conditions in the partial index filter expression and N is the number of conditions in the query.
With a 10K $in list in the partial index filter expression and 10K conditions in the query, the query takes 40-60 seconds to run on an empty collection. Perf reports two distinct stack traces, but both involve isSubsetOf:
+ 99.44% 0.00% conn23 mongod [.] mongo::QueryPlannerIXSelect::stripInvalidAssignments(mongo::MatchExpression*, std::vector<mongo::IndexEntry, std::allocator<mongo::Index + 99.44% 2.83% conn23 mongod [.] mongo::expression::isSubsetOf(mongo::MatchExpression const*, mongo::MatchExpression const*) + 50.00% 4.38% conn23 mongod [.] mongo::EqualityMatchExpression::EqualityMatchExpression(boost::optional<mongo::StringData>, mongo::BSONElement const&, mongo::clonable_p + 45.02% 3.25% conn23 mongod [.] mongo::ComparisonMatchExpression::ComparisonMatchExpression<mongo::BSONElement const&>(mongo::MatchExpression::MatchType, boost::optiona + 41.23% 7.09% conn23 mongod [.] mongo::ComparisonMatchExpressionBase::ComparisonMatchExpressionBase<mongo::BSONElement const&>(mongo::MatchExpression::MatchType, boost:
or
+ 99.62% 0.00% conn173 mongod [.] mongo::plan_cache_detail::encodeIndexability(mongo::MatchExpression const*, mongo::PlanCacheIndexabilityState const&, mongo::StringBuil▒ + 48.32% 4.84% conn173 mongod [.] mongo::EqualityMatchExpression::EqualityMatchExpression(boost::optional<mongo::StringData>, mongo::BSONElement const&, mongo::clonable_▒ + 43.02% 3.52% conn173 mongod [.] mongo::ComparisonMatchExpression::ComparisonMatchExpression<mongo::BSONElement const&>(mongo::MatchExpression::MatchType, boost::option▒ + 38.87% 6.70% conn173 mongod [.] mongo::ComparisonMatchExpressionBase::ComparisonMatchExpressionBase<mongo::BSONElement const&>(mongo::MatchExpression::MatchType, boost▒ + 30.71% 7.19% conn173 mongod [.] mongo::PathMatchExpression::PathMatchExpression(mongo::MatchExpression::MatchType, boost::optional<mongo::StringData>, mongo::ElementPa▒ + 23.76% 8.00% conn173 mongod [.] mongo::(anonymous namespace)::_isSubsetOf(mongo::ComparisonMatchExpression const*, mongo::ComparisonMatchExpression const*) ▒ + 16.64% 1.13% conn173 mongod [.] mongo::FieldRef::FieldRef(mongo::StringData) ▒ + 15.50% 11.21% conn173 mongod [.] mongo::FieldRef::parse(mongo::StringData) ▒ + 12.90% 6.12% conn173 mongod [.] mongo::(anonymous namespace)::_isSubsetOf(mongo::MatchExpression const*, mongo::ComparisonMatchExpression const*)
I think a few shortcuts can be considered to avoid the quadratic time complexity:
- if the index is over a field that does not participate in the $match, then it should not be considered at all. Right now even if the index is on a different column, the query takes ~20 seconds.
- if the index has a top-level $or, such as the implicit one that results from a top-level $in, it should not be considered when answering queries with a top-level $and.
- If the index is over $in , it can not possibly be used to answer queries with inequality conditions.
To reproduce:
import os from time import time from pymongo import MongoClientclient = MongoClient(os.environ["MONGODB_URI"]) for entry_count in [10000]: db = client.test db.drop_collection("test") coll = db.test coll.insert_one({"f0": [0]}) cond_or = {"$or": [{"f0": {"$eq": i}} for i in range(entry_count)]} cond_and = {"$and": [{"f0": {"$gt": i}} for i in range(entry_count)]} coll.create_index("f0", partialFilterExpression = cond_or ) start = time(); pipeline = [{"$match": cond_and}] print(pipeline); res = list(coll.aggregate(pipeline)) duration = time() - start; print(f"entries: {entry_count}; duration: {duration}")