[SERVER-64518] Last point TS Opt M3: Rewrite only when index is available Created: 15/Mar/22  Updated: 06/Apr/22  Resolved: 05/Apr/22

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: New Feature Priority: Minor - P4
Reporter: Steve Tarzia Assignee: Alya Berciu
Resolution: Won't Fix Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Related
Participants:
Linked BF Score: 135

 Description   

alya.berciu saw a 60% performance drop on some last point queries when we do a rewrite but the proper index is not available.  We should look into whether we can avoid this rewrite when we know it will be harmful.



 Comments   
Comment by Charlie Swanson [ 05/Apr/22 ]

Thanks for the writeup alya.berciu. I think this is another good demonstration of how we want to have an optimizer that understands the entire query, and another ticket to consider using as a proof point of the new optimizer once we get to the phase where we can understand both $group queries and time-series collections. cc steve.la and svilen.mihaylov

Comment by Alya Berciu [ 05/Apr/22 ]

After discussing with steve.tarzia, we decided not to go forward with this. Bypassing the rewrite would improve some cases (where the $match selectivity is 90%, the rewrite is ~60% worse) at the expense of other cases (where the $match selectivity is 10%, the rewrite is 85% better).

We do not currently have a way to differentiate the two queries in code and pick an appropriate plan for each case, so we have to pick one case to prioritize. We will go with the case where we have a smaller selectivity.

Here is a link to the relevant patch where I tested the effect of using a smaller selectivity in the query: https://spruce.mongodb.com/version/6246c4b0850e614ad49c68bf/tasks?sorts=STATUS%3AASC%3BBASE_STATUS%3ADESC

When the feature flag is on we do significantly better than when the feature flag is off.

Query Average Latency (FF off) Average Latency (FF on) % Difference
{time: 1, meta: 1}  6646829472.5  1003973422.8  -85%
{time: 1, meta: -1}  6632846559  1004721137.6  -85%

I also created a separate ticket to update the Genny workload to test both cases: PERF-2919

Generated at Thu Feb 08 06:00:32 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.