[SERVER-84955] Investigate how index scan depends on number of selected documents Created: 03/Oct/22  Updated: 12/Jan/24  Resolved: 13/Oct/22

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: Alexander Ignatyev Assignee: Ruoxin Xu
Resolution: Fixed Votes: 0
Labels: M6
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: HTML File IndexScan.html    
Issue Links:
Related
is related to SERVER-70500 Calibrate ABT nodes on smaller queries Closed
Sprint: QO 2022-10-17
Participants:

 Description   

It appears that Index Scan has a hidden initialization cost which somehow not revealed directly by our regression models but revealed indirectly by different values of the n_processed coefficient.

Design an experiment where the queries selects different number of documents, like 10k, 20k, 30k,..., 150k, try to keep the same number of queries for every point



 Comments   
Comment by Alexander Ignatyev [ 12/Oct/22 ]

Yes, we do, per our offline discussion we need at least to try to calibrate on queries that have smaller values of n_proceseed. It is closer to our usual OLTP workflow.

Comment by Ruoxin Xu [ 12/Oct/22 ]

The experiments show that more documents returned(lager “n_processed”) the coefficient (average cost to process one document) is smaller. This is as expected due to some potential hidden initialization cost. For example, as shown in the experiments, when n_processed is above 1e6, the cost is 0.0055. While the n_processed is smaller (below 5000), the cost is 0.0433. As discussed, we may want to use more selective queries in calibration?   Cc: alexander.ignatyev@mongodb.com 

Generated at Thu Feb 08 06:56:26 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.