Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-96253

A complex partial index filter expression causes quadratic execution time on ineligible complex queries

    • Type: Icon: Improvement Improvement
    • Resolution: Unresolved
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • Query Optimization

      This is a bit convoluted, but please bear with me:

      1. if there is a partial index filter expression in the form of an $in list or equivalent $or 

      2. and a complex query that is not compatible with the index

      then, the execution of the query is delayed by O(N*M), where N is the number of conditions in the partial index filter expression and N is the number of conditions in the query.

      With a 10K $in list in the partial index filter expression and 10K conditions in the query, the query takes 40-60 seconds to run on an empty collection. Perf reports two distinct stack traces, but both involve isSubsetOf:

       

      +   99.44%     0.00%  conn23           mongod             [.] mongo::QueryPlannerIXSelect::stripInvalidAssignments(mongo::MatchExpression*, std::vector<mongo::IndexEntry, std::allocator<mongo::Index
      +   99.44%     2.83%  conn23           mongod             [.] mongo::expression::isSubsetOf(mongo::MatchExpression const*, mongo::MatchExpression const*)
      +   50.00%     4.38%  conn23           mongod             [.] mongo::EqualityMatchExpression::EqualityMatchExpression(boost::optional<mongo::StringData>, mongo::BSONElement const&, mongo::clonable_p
      +   45.02%     3.25%  conn23           mongod             [.] mongo::ComparisonMatchExpression::ComparisonMatchExpression<mongo::BSONElement const&>(mongo::MatchExpression::MatchType, boost::optiona
      +   41.23%     7.09%  conn23           mongod             [.] mongo::ComparisonMatchExpressionBase::ComparisonMatchExpressionBase<mongo::BSONElement const&>(mongo::MatchExpression::MatchType, boost:
       

      or

      +   99.62%     0.00%  conn173          mongod             [.] mongo::plan_cache_detail::encodeIndexability(mongo::MatchExpression const*, mongo::PlanCacheIndexabilityState const&, mongo::StringBuil▒
      +   48.32%     4.84%  conn173          mongod             [.] mongo::EqualityMatchExpression::EqualityMatchExpression(boost::optional<mongo::StringData>, mongo::BSONElement const&, mongo::clonable_▒
      +   43.02%     3.52%  conn173          mongod             [.] mongo::ComparisonMatchExpression::ComparisonMatchExpression<mongo::BSONElement const&>(mongo::MatchExpression::MatchType, boost::option▒
      +   38.87%     6.70%  conn173          mongod             [.] mongo::ComparisonMatchExpressionBase::ComparisonMatchExpressionBase<mongo::BSONElement const&>(mongo::MatchExpression::MatchType, boost▒
      +   30.71%     7.19%  conn173          mongod             [.] mongo::PathMatchExpression::PathMatchExpression(mongo::MatchExpression::MatchType, boost::optional<mongo::StringData>, mongo::ElementPa▒
      +   23.76%     8.00%  conn173          mongod             [.] mongo::(anonymous namespace)::_isSubsetOf(mongo::ComparisonMatchExpression const*, mongo::ComparisonMatchExpression const*)            ▒
      +   16.64%     1.13%  conn173          mongod             [.] mongo::FieldRef::FieldRef(mongo::StringData)                                                                                           ▒
      +   15.50%    11.21%  conn173          mongod             [.] mongo::FieldRef::parse(mongo::StringData)                                                                                              ▒
      +   12.90%     6.12%  conn173          mongod             [.] mongo::(anonymous namespace)::_isSubsetOf(mongo::MatchExpression const*, mongo::ComparisonMatchExpression const*)                 
       

       

      I think a few shortcuts can be considered to avoid the quadratic time complexity:

      • if the index is over a field that does not participate in the $match, then it should not be considered at all. Right now even if the index is on a different column, the query takes ~20 seconds.
      • if the index has a top-level $or, such as the implicit one that results from a top-level $in, it should not be considered when answering queries with a top-level $and.
      • If the index is over $in , it can not possibly be used to answer queries with inequality conditions.

       

      To reproduce:

      import os
      from time import time
      from pymongo import MongoClientclient = MongoClient(os.environ["MONGODB_URI"])
      for entry_count in [10000]:
        db = client.test
        db.drop_collection("test")
        coll = db.test
        coll.insert_one({"f0": [0]})
        cond_or = {"$or": [{"f0": {"$eq": i}} for i in range(entry_count)]}
        cond_and = {"$and": [{"f0": {"$gt": i}} for i in range(entry_count)]}
        coll.create_index("f0",
                  partialFilterExpression = cond_or
        )
        start = time();
        pipeline = [{"$match": cond_and}]
        print(pipeline);
        res = list(coll.aggregate(pipeline))
        duration = time() - start;
        print(f"entries: {entry_count}; duration: {duration}")
       

            Assignee:
            Unassigned Unassigned
            Reporter:
            philip.stoev@mongodb.com Philip Stoev
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

              Created:
              Updated: