Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-98494

Estimate MatchExpression $exists (and other predicates) via histograms

    • Type: Icon: Task Task
    • Resolution: Unresolved
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • None
    • Query Optimization

      Currently $exists (and possibly other predicates) is estimated via heuristic CE when represented as a MatchExpression. Any leaf predicate different from comparison predicates, And, Or, Not is estimated via 

      CardinalityEstimator::estimate(const LeafMatchExpression* node)

      which is in turn implemented as heuristic CE. 
       
      However, given an index on the $exists field, an interval is generated, which in turn ican be estimated via histogram if one is available. Example:
       

      coll.find({missing_50_percent: {$exists: false}}).explain();

      The plan is:
       

          winningPlan: {
            isCached: false,
            stage: 'FETCH',
            costEstimate: 5.2621663000000005,
            cardinalityEstimate: 3000,
            estimatesMetadata: { ceSource: 'Histogram' },
            filter: { missing_50_percent: { '$not': { '$exists': true } } },
            inputStage: {
              stage: 'IXSCAN',
              costEstimate: 1.225155,
              cardinalityEstimate: 3000,
              numKeysEstimate: 3000,
              estimatesMetadata: { ceSource: 'Histogram' },
              keyPattern: { missing_50_percent: 1 },
              indexName: 'missing_50_percent_1',
              isMultiKey: false,
              multiKeyPaths: { missing_50_percent: [] },
              isUnique: false,
              isSparse: false,
              isPartial: false,
              indexVersion: 2,
              direction: 'forward',
              indexBounds: { missing_50_percent: [ '[null, null]' ] }
            }
          },  

       
      While if we drop the index, we use heuristics:

      coll.dropIndex({missing_50_percent:1});
      coll.find({missing_50_percent: {$exists: false}}).explain();

      The plan is:
       

          winningPlan: {
            isCached: false,
            stage: 'COLLSCAN',
            costEstimate: 3.5472981,
            cardinalityEstimate: 60,
            numDocsEstimate: 6000,
            estimatesMetadata: { ceSource: 'Heuristics' },
            filter: { missing_50_percent: { '$not': { '$exists': true } } },
            direction: 'forward'
          },  

       
      Ideally this ticket should consider all predicates that can be expressed as an interval, and therefore estimated via a histogram.

            Assignee:
            Unassigned Unassigned
            Reporter:
            timour.katchaounov@mongodb.com Timour Katchaounov
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated: