Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-99093

histogramCE: Estimate for inequality is zero if NDV > numberBuckets

    • Type: Icon: Bug Bug
    • Resolution: Unresolved
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • None
    • Query Optimization
    • ALL
    • Hide
      db.foo.drop();
      
      const string_docs = [];
      for (let i = 0; i < 100; i++) {
          string_docs.push({a: 'a'.repeat(i)});
      }
      
      for (let i = 0; i < 100; i++) {
          string_docs.push({a: 'm'.repeat(i)});
      }
      
      db.foo.insertMany(string_docs);
      db.foo.createIndex({a: 1});
      db.foo.runCommand({analyze: "foo", key: "a", numberBuckets:50});
      db.adminCommand({setParameter: 1, planRankerMode: "histogramCE"});
      db.foo.find({a: {$gte: "mmmmmmmm"}}).explain().queryPlanner.winningPlan.cardinalityEstimate;
      db.foo.find({a: {$gte: "mmmmmmmm"}}).count();
      db.system.statistics.foo.find();
      
      Show
      db.foo.drop(); const string_docs = []; for (let i = 0; i < 100; i++) { string_docs.push({a: 'a' .repeat(i)}); } for (let i = 0; i < 100; i++) { string_docs.push({a: 'm' .repeat(i)}); } db.foo.insertMany(string_docs); db.foo.createIndex({a: 1}); db.foo.runCommand({analyze: "foo" , key: "a" , numberBuckets:50}); db.adminCommand({setParameter: 1, planRankerMode: "histogramCE" }); db.foo.find({a: {$gte: "mmmmmmmm" }}).explain().queryPlanner.winningPlan.cardinalityEstimate; db.foo.find({a: {$gte: "mmmmmmmm" }}).count(); db.system.statistics.foo.find();
    • QO 2025-02-03, QO 2025-02-17

      If the number of distinct values in the collection becomes larger than the numberBuckets, histogram estimates for $gt become wildly inaccurate – they are either zero or some very low value.

      Enterprise test> db.foo.find({a: {$gte: "mmmmmmmm"}}).explain().queryPlanner.winningPlan.cardinalityEstimate;
      0
      Enterprise test> db.foo.find({a: {$gte: "mmmmmmmm"}}).count();
      92
      

      The estimate is incorrect/zero in both directions – $gt and $lt the pivot point:

      Enterprise test> db.foo.find({a: {$gte: "mmmmmmmm"}}).explain().queryPlanner.winningPlan.cardinalityEstimate;
      0
      Enterprise test> db.foo.find({a: {$lt: "mmmmmmmm"}}).explain().queryPlanner.winningPlan.cardinalityEstimate;
      0
      

      The histogram is quite skewed, for which I will open a separate issue.

            Assignee:
            timour.katchaounov@mongodb.com Timour Katchaounov
            Reporter:
            philip.stoev@mongodb.com Philip Stoev
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated: