Remove residualfilter condition from indexscannode samplingCE input cardinality optimization

    • Type: Improvement
    • Resolution: Unresolved
    • Priority: Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • None
    • Query Optimization
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      In IndexScanNode input cardinality estimation for samplingCE the optimization avoiding calculation of index keys, is avoiding any index scan node that contains a residual filter.
      Logically, we can avoid calculating index keys and use a MatchExpression to estimate the interval, with a minor difference in calculation (not including the residual filter for calculating input CE)
      We kept the condition as there is a set of tests failing
      Example of such tests (along with some debug information):

      [js_test:plan_stability2] {">>>pipeline": [{"$match":{"$and":[{"field8_int_idx":{"$ne":353},"field45_bool":{"$type":"bool"}},{"$or":[{"field31_list_idx":{"$lte":220},"field25_str_idx":{"$eq":"Gl"}},{"field19_datetime_idx":{"$gte":"2024-01-27T00:00:00.000Z"},"field24_mixed_idx":{"$all":[]}},{"field8_int_idx":{"$lt":831},"field28_datetime_idx":{"$eq":"2024-01-10T00:00:00.000Z"},"field23_dict_idx":{"$gte":{"c":1,"b":3}},"field6_mixed_idx":{"$in":[6,99562]}},{"field17_int_idx":{"$lte":239},"field44_int":{"$lte":7}}]},{"$or":[{"$or":[{"field16_str_idx":{"$ne":"Z"},"field5_dict_idx":{"$eq":{"b":1,"e":1}}},{"field45_bool":{"$lte":false},"field16_str_idx":{"$gte":"hS"}}]},{"field4_list_idx":{"$ne":21},"field35_int_idx":{"$eq":4539}}]}],"field21_Decimal128_idx":{"$type":"decimal"},"field26_int_idx":{"$lt":9043},"field19_datetime_idx":{"$ne":"2024-01-05T00:00:00.000Z"},"$nor":[{"field24_mixed_idx":["k","r","l","p"],"field31_list_idx":{"$ne":50}},{"field4_list_idx":["t","i","g","y","v"],"field18_bool_idx":{"$exists":true},"field47_Timestamp":{"$eq":{"$timestamp":{"t":1755595335,"i":0}}}}]}},{"$skip":52},{"$project":{"field38_Timestamp":1,"_id":0}}],
      
      Specifically the following nodes make different estimations compared to estimateKeysScanned:
      
      [j0] node->filter: { field21_Decimal128_idx: { $type: [ 19 ] } }
      [j0] node->bounds: field #0['field21_Decimal128_idx']: [inf, nan]
      [j0] est.outCE: { Cardinality: 0.0, Source: "Sampling" }
      [j0] estimateKeysScanned: { Cardinality: 100000.0, Source: "Sampling" }
      
      [j0] prefix: { field21_Decimal128_idx: [ "[inf, nan]" ] }
      [j0] isEqPrefix: 0
      [j0] ridsEstFunct(*prefix.eqPrefixPtr, nullptr) : { Cardinality: 0.0, Source: "Sampling" }
      

      This ticket should re-evaluate this condition and address the problem.

            Assignee:
            Unassigned
            Reporter:
            Matt Olma
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated: