[SERVER-55612] Improve handling of $geonear+$or when all $or clauses involve the same $near key Created: 30/Mar/21 Updated: 03/Feb/22 Resolved: 03/Feb/22 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | 4.4.0 |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Trivial - P5 |
| Reporter: | Al Han | Assignee: | Svilen Mihaylov (Inactive) |
| Resolution: | Won't Fix | Votes: | 0 |
| Labels: | query-director-triage | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||||||||||
| Issue Links: |
|
||||||||||||||||
| Participants: | |||||||||||||||||
| Description |
|
I'm using MongoDB 4.4 pipeline aggregate to retrieve documents sorted by distance from a collection of around 1.5M documents. While I'm only retrieving 12 documents at the time, I CANNOT limit the maxDistance used as I'm sorting and paging results. The following query is very slow (unless the second argument in the $or sub-query is removed). The culprit is clearly the $or sub-query. Running the $or sub-query independently yields very fast results so I suspect it has to do with the combination of $geonear and $or. I have created indices created for location, reference and _id of course. I also have composite indices for all three
I cannot understand why this query is slow when I have created the appropriate indexes. Using explain() shows that a composite index is being used but the query is still too slow. This was also report on https://stackoverflow.com/questions/66576357/mongodb-aggregate-geonear-function-very-slow-when-using-with-or-sub-query
|
| Comments |
| Comment by David Percy [ 03/Feb/22 ] | ||||||||||
|
As Eric pointed out, theoretically we could create two index scans and merge sort them by geo distance. But I think this would be hard to do, because $near and $nearSphere are treated as such a special case. (And semantically they really are a special case, because they look like a filter, but are actually filter+sort.) As a workaround, you could use $geoNear and $unionWith to get two index scans, and combine the results with $sort:
Today this will do a blocking sort, but if the predicates on '_id' and 'reference' are very selective then that may be fine. In the future we could optimize this to a merge sort: SERVER-48120. Also, if you have indexes on {_id: 1, location: '2dsphere'} and {reference: 1, location: '2dsphere'}, with the equality predicate before the geo range predicate, then each index scan should have good bounds. Closing for now, but we could reconsider if this becomes a common request. | ||||||||||
| Comment by Eric Sedor [ 19/Aug/21 ] | ||||||||||
|
To clarify my reasoning about | ||||||||||
| Comment by Al Han [ 28/Apr/21 ] | ||||||||||
|
Thanks Eric. I don't see how this is related to | ||||||||||
| Comment by Eric Sedor [ 27/Apr/21 ] | ||||||||||
|
Thanks for your patience aphtechme@gmail.com. It looks like this issue is related to | ||||||||||
| Comment by Al Han [ 31/Mar/21 ] | ||||||||||
|
Hi Eric, I've attached a full output of explain("executionStats"). Let me know if there's anything else need. Thank you for your assistance, Al | ||||||||||
| Comment by Eric Sedor [ 31/Mar/21 ] | ||||||||||
|
Hi aphtechme@gmail.com, we can take a look. Could you please go ahead and provide the full output of explain("executionStats")? | ||||||||||
| Comment by Al Han [ 31/Mar/21 ] | ||||||||||
|
Part of WinningPlan
| ||||||||||
| Comment by Al Han [ 30/Mar/21 ] | ||||||||||
|
Here are some output from explain(): "keysExamined":524854,"docsExamined":1037474,"cursorExhausted":true,"numYields":1048,"nreturned":9 Also seems like the correct index containing (location,_id,reference) is being used in winningPlan Reducing the distanceMultiplier to 1 has not performance improvement impact whatsoever. |