[SERVER-55215] Handle small measurement counts in buckets for ARHASH Created: 15/Mar/21  Updated: 29/Oct/23  Resolved: 09/Apr/21

Status: Closed
Project: Core Server
Component/s: Querying
Affects Version/s: None
Fix Version/s: 5.0.0-rc0

Type: Task Priority: Major - P3
Reporter: Eric Cox (Inactive) Assignee: David Storch
Resolution: Fixed Votes: 0
Labels: qexec-team
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: Microsoft Word unpack bucket perf.xlsx    
Issue Links:
Problem/Incident
Backwards Compatibility: Fully Compatible
Sprint: Query Execution 2021-03-22, Query Execution 2021-04-05, Query Execution 2021-04-19
Participants:
Linked BF Score: 28

 Description   

When testing SERVER-54221 we realized that if the gTimeseriesBucketMaxCount is significantly larger than the actual bucket counts we will exhaust the kMaxAttempt to get a non-duplicate document and fail the query.

This behavior is not ideal. We should implement a fall-back mechanism when this is the case. Two ideas discussed is to figure out if we can cheaply compute the maximum bucket count, or use a trial stage to see if we are able to sample the collection, and if not fallback to topk sorting.



 Comments   
Comment by Githook User [ 09/Apr/21 ]

Author:

{'name': 'Eric Cox', 'email': 'eric.cox@mongodb.com', 'username': 'ericox'}

Message: SERVER-55215 Handle small measurement counts in buckets for ARHASH

Co-authored-by: David Storch <david.storch@mongodb.com>
Branch: master
https://github.com/mongodb/mongo/commit/26083aaf57fded55e2ca4f82a536b1b5b3a1e6f7

Generated at Thu Feb 08 05:35:50 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.