[SERVER-77949] Investigate generalisation of DISTINCT_SCAN for clustered collection _id subfields Created: 09/Jun/23  Updated: 22/Jun/23

Status: Backlog
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Jordi Olivares Provencio Assignee: Backlog - Query Optimization
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Assigned Teams:
Query Optimization
Participants:

 Description   

In order to get a list of all pre-image collections present in config.system.preimages we are performing something similar to DISTINCT_SCAN on _id.nsUUID.

The problem is that right now DISTINCT_SCAN falls back to a full collection scan since there are no indexes available for the collection and we are scanning a subfield of _id.

We can work around this in our use-case since we know the format of _id and that nsUUID is the first field of the identifier. As a result we can safely enumerate them using simple RecordCursor::seekNear. Ideally we would prefer to use DISTINCT_SCAN.



 Comments   
Comment by Kyle Suarez [ 14/Jun/23 ]

This is a request to improve the plan generated by the optimizer so I am sending this to the Query Optimization team.

Generated at Thu Feb 08 06:37:03 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.