[SERVER-68672] Query using shard key and document ID does not use available index Created: 09/Aug/22  Updated: 30/Aug/22  Resolved: 30/Aug/22

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 4.2.21
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Shachar Hochberg Assignee: Eric Sedor
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File image-2022-08-09-13-42-22-395.png     PNG File image-2022-08-09-16-23-11-039.png     PNG File image-2022-08-09-16-23-38-511.png     PNG File image-2022-08-09-16-24-30-226.png     PNG File image-2022-08-09-16-32-04-424.png     PNG File image-2022-08-09-16-32-24-886.png     PNG File image-2022-08-09-16-32-59-723.png     PNG File image-2022-08-09-16-34-22-817.png     PNG File image-2022-08-09-16-36-06-516.png    
Issue Links:
Duplicate
duplicates SERVER-15225 CachedPlanStage should execute for tr... Closed
Operating System: ALL
Participants:

 Description   

I have a sharded collection with around ~ 200M documents, and an index on this collection: {datasetId: 1, _id: 1}, where datasetId is the shard key on the collection.

When I query on the collection with the following query:

 

find({datasetId: 'YYY', _id: ${gt: ObjectId('XXX')}}.sort({_id: 1})

The query does not use the aforementioned index, it uses the {_id: 1} index, and takes a very long time to run, while greatly increasing the mongo servers cpu usage.

 

Here are the query execution stats for a datasetId where ~ 40K documents satisfy the query, and an _id which is the oldest available that has specifed datasetId - which took 28.6 seconds:

If I hint the index on the same query it uses it and runs very quickly scanning only the needed documents:

Using any other operator than $gt /$gte works ($lt / $lte on the newest available ObjectId, $ne - all use the index with no need for a hint), the greater than operators are the only ones that do not use the index (example of a $lt query):

I cannot hint at the index each in query of this nature, Why do only the $gt and $gte operators not use the index? seems they should act like any other operator and use the index if it is available?



 Comments   
Comment by Eric Sedor [ 11/Aug/22 ]

Hi shachar.h@dataloop.ai, this looks very similar to SERVER-17260 which is considered a duplicate of SERVER-15225.

Can you revisit your use-case in the context of the configuration option and the workarounds discussed on SERVER-15225, and let us know how it goes?

Generated at Thu Feb 08 06:11:25 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.