[SERVER-18426] $geoNear expands aggressively if the centroid is far from the dense data Created: 12/May/15  Updated: 04/Jan/24

Status: Backlog
Project: Core Server
Component/s: Geo
Affects Version/s: 3.0.2
Fix Version/s: None

Type: Improvement Priority: Minor - P4
Reporter: Andrey Hohutkin Assignee: Backlog - Query Integration
Resolution: Unresolved Votes: 0
Labels: qi-geo, query-44-grooming
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File around_test.7z    
Issue Links:
Related
related to SERVER-29228 geoNear scans too many records in ver... Backlog
is related to SERVER-26974 Poor 2dsphere / $near performance and... Closed
Assigned Teams:
Query Integration
Participants:

 Description   

I have a collection with 300,000 documents. In collection there is a 2dsphere index on 'geo' field.

Here is a query I run:
db.runCommand(

{ geoNear: "around_test", near: [59.6048899000000030, 36.2774393999999990], num: 500, spherical:true, query: null, }

);

Stats of this query are weird:
stats:

{ "nscanned" : 103971, "objectsLoaded" : 80166, "avgDistance" : 0.1816846648955057, "maxDistance" : 0.1955302102158397, "time" : 1124 }

In the query I limit results to 500 documents. But in stats I see that mongoDB reads 80166 (objectsLoaded) documents from disk and only then cuts it out.
No need to read extra documents from disk if there is a limit on result length.



 Comments   
Comment by Siyuan Zhou [ 24/Jul/15 ]

As brandon.zhang explained above, this is different from SERVER-18056 as the centroid is far from the dense data. I'll update the title and reopen this issue.

Comment by Brandon Zhang [ 24/Jul/15 ]

This behavior is due to the way geoNear expands its search. geoNear works by searching for documents in distance intervals successively farther from the centroid. At each interval, it will sort the documents by distance and return them to its parent stage. If the number of documents returned in an interval is less than 300, the next distance interval will double its range.
The explain produced by this query will show the distance intervals and the number of documents returned in each one:
db.around_test.find({geo:{$nearSphere:[59.6048899000000030, 36.2774393999999990]}}).limit(500).explain("executionStats")
In this case, the centroid of the geoNear search seems to be far from a dense patch of points. Before the last interval, every interval that geoNear searched returned less than 300 documents. This means that the distance intervals kept doubling until it hit the dense patch, which returned over 80,000 documents. Unfortunately, there is currently no way for geoNear to account for this problem since it has no prior knowledge of the data distribution.

Comment by Sam Kleinman (Inactive) [ 20/May/15 ]

Thanks for this report.

Because geoNear returns sorted results, it starts from the geometry (i.e. point) specified in the query and selects all points that exist within a radius of the starting point. The operation sorts results, and if the result set requires additional results, it fetches all documents within a larger additional covering (a donut shaped surface). Because of the way that the geo indexes work, its possible for the query to fetch and examine the same document more than once during a single query.

We were able to reproduce your results, and while this behavior and performance that is not desirable, it is expected given the current implementation. The work defined by SERVER-18056 should improve this behavior. As a result we are going to close this ticket, please tune in to SERVER-18506 for updates.

Regards,
sam

Generated at Thu Feb 08 03:47:39 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.