[SERVER-73882] Investigate unexpected CE test result Created: 10/Feb/23  Updated: 29/Jun/23

Status: Open
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: Milena Ivanova Assignee: Backlog - Query Optimization
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: HTML File case1    
Assigned Teams:
Query Optimization
Participants:

 Description   

Case1: The following query over ce_data_1000 collection from the CE accuracy tests shows very imprecise estimate

Id: 6066: [ { "$match" : { "mixed_arr_str_70_30" : { "$gt" : "LeG7", "$lt" : "LgG7" } } } ], qtype: medium range, data type: array                                      
cardinality: 126, Histogram estimation: 394.17, errors: {  "absError" : 268.17"relError" : 2.13"selError" : 26.82 } 

The data has only 33 values and is completely represented in the histogram buckets.

If we apply the formula 

Card(ArrayMin(a < valHigh)) - Card(ArrayMax(a < valLow)) we get 291 - 165 = 136, which is a much more precise estimate. Investigate why we get the value of 394.17.


Generated at Thu Feb 08 06:25:54 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.