[SERVER-56093] A distinct query on a nested field becomes very slow when adding a multikey index covering the query Created: 14/Apr/21 Updated: 17/May/21 Resolved: 17/May/21 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | 4.2.8 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Cyril | Assignee: | Edwin Zhou |
| Resolution: | Incomplete | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
MongoDB v4.2.8 |
||
| Attachments: |
|
| Operating System: | ALL |
| Steps To Reproduce: | I am not sure what would be the minimum scenario to reproduce, but it seems like
Could reproduce the problem |
| Participants: |
| Description |
|
We ran into an issue we can't explain, when adding a covering index on a distinct query caused huge performance issues, and we observed the CPU of our instances going to 100%. Amongst other metrics, we weren't able to find anything that seemed to be correlated to the problem (swap, page faults, etc seemed as usual, but maybe we weren't looking at the right things)
Here is a sample event document. Note the nested properties.id field that we are performing a distinct() on
of a collection of about 20 million documents We perform the following query
Previously, we had no covering index on this type of query, and the selected plan was
Using an index defined for other purposes (1)
But the request was still fast to the extent or retrieving documents in <1sec After creating the following index (2)
the winning plan becomes
And the request becomes very slow as shown in our graphs (the query can take up to several minutes when the server is loaded, and a dozen seconds when it is the only query) Furthermore, we tried creating another index that matched exactly the query (3)
The result was that this index was still not selected as a winning plan, and mongoDB kept using the previous index with the extra "time: -1" in the multikey definition (index (2)). After removing the index (2)
our new index (3) is selected with a winning plan involving a DISTINCT_SCAN and works fast as expected So several questions
|
| Comments |
| Comment by Edwin Zhou [ 17/May/21 ] | ||||
|
Hi CyrilDD, We haven’t heard back from you for some time, so I’m going to close this ticket. Please let us know if this is still an issue for you, or if you have any remaining questions and I can reopen this ticket! Best, | ||||
| Comment by Edwin Zhou [ 07/May/21 ] | ||||
|
Hi CyrilDD, We still need additional information to diagnose the problem. If this is still an issue for you, please let us know if my previous comment clarifies your reported behavior! Thanks, | ||||
| Comment by Edwin Zhou [ 21/Apr/21 ] | ||||
|
Hi CyrilDD, Thanks for your detailed report. I'd first like to clarify that a multikey index is not in use here because none of the fields in your example document is an array. You may be referring to indexing on an embedded field. It appears that you're attempting to use a compound index (2) that is incompatible with the query field in your distinct command. Note that "properties.id" is the field for which distinct values are returned, but is not a part of the query. The query in question:
can use compound indexes if the index is prefixed with name and user_id respectively. Both
are compatible since they begin with {"name": 1, "user_id": 1}. This index:
is incompatible because the prefix doesn't match the query field. You can read more about compound query prefixes in our documentation. Does this clarify your reported behavior? Best, |