[SERVER-57989] Unfortunate index selection leads to too many index/document scans Created: 22/Jun/21 Updated: 22/Jun/21 Resolved: 22/Jun/21 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | 4.4.2 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Daniel Hegener | Assignee: | Unassigned |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | indexing, performance | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Operating System: | ALL | ||||||||
| Steps To Reproduce: | Not easy, I suppose... |
||||||||
| Participants: | |||||||||
| Description |
|
We have a query:
Let's ignore the mind-boggling logic on the top part of the filter - the important thing to note is that there is a restriction on the "DeliveryId" field. Why is that of interest? We run this query against a collection that contains ~29m documents which are semantically split into individual "deliveries" using this "DeliveryId" field. We're talking roughly 20 deliveries or so, each of which consists of either ~6m documents or a substantially smaller number of documents, e.g. 150k. There is a bunch of indexes on the collection, too, one of which looks like { *DeliveryId: 1*, RecordId: 1, DataSource: 1, Revision: 1 }and, as per my understanding, should be helpful to avoid full scans of all documents whenever a query filters by "DeliveryId". What we are seeing in the logs sometimes is the below, however:
So, every now and again, MongoDB appears to choose to use two indexes:
The things that I would claim are dubious here are
I also wonder why the second index gets considered in the first place as the { _RecordId: 1, DataSource: 1_ }combination is already contained in the first index... Anyway, it would appear that once MongoDB decides to use a different index that does not contain the most important field, we end up somehow scanning the whole collection...?
Some additional business background: It's probably worth noting that the filter on the top
is a filter of very limited use as it won't remove any documents from the result set at all in a lot of cases. The above query gets generated based on user input and unless the user specifies some meaningful values for either "RecordId" or "DataSource", it will match on any document in the entire collection. The intention here is to allow a user to remove individual documents from the result based on their "RecordId" and/or "DataSource". Also, the next part
is an unusual filter as its purpose is to allow for a user to specify some values for "RecordId" and/or "DataSource" to add some records to a base set ("DataSource":{ "$not":"/.Base_DataSource./i" }) that shall remain included.
|
| Comments |
| Comment by Edwin Zhou [ 22/Jun/21 ] |
|
Thanks for letting us know! I will close this as a duplicate and the investigation will proceed over at Best, |
| Comment by Daniel Hegener [ 22/Jun/21 ] |
|
I'm sorry for creating this duplicate of |