[SERVER-9484] 2dsphere Index may not return all documents for $near and geoNear Created: 26/Apr/13  Updated: 11/Jul/16  Resolved: 02/May/13

Status: Closed
Project: Core Server
Component/s: Geo, Index Maintenance
Affects Version/s: 2.4.3
Fix Version/s: 2.4.4, 2.5.0

Type: Bug Priority: Major - P3
Reporter: Jim Dagg Assignee: hari.khalsa@10gen.com
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Windows 7 x64


Attachments: File QueryDiff.js     Zip Archive diffs.zip     File geoIntersects.kml     File near.kml     File out.json    
Operating System: ALL
Steps To Reproduce:

Import data file (attached) into MongoDB using mongoImport to database "sparcTest" and collection "TirCSV". Ensure 2dsphere index on "geometry" field.
Perform .count() on a global find. Note it returns 3063 elements.
Perform $geoNear query above and note it misses some data. Record number of elements returned.
Perform $within query above and note it returns all data.
Reindex TirCSV collection.
Perform $geoNear query and record number of elements returned – may be different.
Perform $geoWithin query and note it still returns all data.

Participants:

 Description   

When performing a $near or geoNear query on GeoJSON data indexed with a 2dsphere index, the query appears to miss data. This may be an issue with the 2dsphere index, as reindexing changes the number of data elements returned for a geoNear query, but not for a bounding-box $geoWithin query. In this case, all of the data is within the polygon bounded by longitude -90 to -80, and latitude 35 to 45. The only secondary index on the collection is

{"geometry" : "2dsphere"}

.

> db.TirCSV.count({ "geometry" : { "$near" : { "$geometry" :

{ "type" : "Point", "coordinates" : [ -85.389 , 40.46]}

}} , "$or" : [

{ "data_type" : "tir"}

]})
3017
> db.TirCSV.find({geometry : {$geoWithin : {$geometry : {type : "Polygon", coordinates: [[[-90,45],[-90,35],[-80,35],[-80,45],[-90,45]]]}}}}).count()
3063
> db.TirCSV.count(

{"data_type" : "tir"}

)
3063
> db.TirCSV.reIndex()
{
"nIndexesWas" : 2,
"msg" : "indexes dropped for collection",
"nIndexes" : 2,
"indexes" : [
{
"key" :

{ "_id" : 1 }

,
"ns" : "sparcTest.TirCSV",
"name" : "id"
},
{
"key" :

{ "geometry" : "2dsphere" }

,
"ns" : "sparcTest.TirCSV",
"name" : "geometry_2dsphere"
}
],
"ok" : 1
}
> db.TirCSV.count({ "geometry" : { "$near" : { "$geometry" :

{ "type" : "Point", "coordinates" : [ -85.389 , 40.46]}

}} , "$or" : [

{ "data_type" : "tir"}

]})
3016
> db.TirCSV.find({geometry : {$geoWithin : {$geometry : {type : "Polygon", coordinates: [[[-90,45],[-90,35],[-80,35],[-80,45],[-90,45]]]}}}}).count()
3063



 Comments   
Comment by auto [ 20/May/13 ]

Author:

{u'date': u'2013-04-29T19:22:03Z', u'name': u'Hari Khalsa', u'email': u'hkhalsa@10gen.com'}

Message: SERVER-9484 don't ignore geometries w/>1 covers if one of them isn't in the search annulus

Conflicts:

src/mongo/db/geo/s2nearcursor.cpp
Branch: v2.4
https://github.com/mongodb/mongo/commit/d1d2e9c89fe5f10ba8b56e46bd38b7634bb7afa1

Comment by auto [ 02/May/13 ]

Author:

{u'date': u'2013-04-29T19:22:03Z', u'name': u'Hari Khalsa', u'email': u'hkhalsa@10gen.com'}

Message: SERVER-9484 don't ignore geometries w/>1 covers if one of them isn't in the search annulus
Branch: master
https://github.com/mongodb/mongo/commit/fc76b559266c4e2aee0707e6d640c08d20ff9e55

Comment by hari.khalsa@10gen.com [ 29/Apr/13 ]

Thank you for reporting it in such a helpful way! There will be a fix in the master branch shortly and it will be backported to the next release of 2.4.

Comment by Jim Dagg [ 29/Apr/13 ]

Thanks, Hari. I appreciate the help, and I'm glad we managed to figure out what was wrong. Looking forward to the fix!

Comment by hari.khalsa@10gen.com [ 29/Apr/13 ]

Hello! You found a bug. Congratulations and my apologies.

Let me tell you a bit about how the bug happened and how I will fix it.

When you add something to a 2dsphere index, MongoDB takes the geometry information and creates "geokeys" from it. A point has one geokey, and a polygon (or linestring) may have several since it's larger. The geokey, aside from being more compact than GeoJSON, also provides rough information about the location of the geometry.

Near searches on a 2dsphere index work by looking at concentric rings around the start point. You find everything with a certain R0 first, then everything between R0,R1, then everything between R1,R2, etc., where R0 < R1 < R2...

When MongoDB was looking at a certain "ring," it was trying to see if the geokey was inside the ring. If the geokey it was looking at wasn't inside the ring, it decided the whole object wasn't inside the ring and ignored it.

The bug is: for objects that generate several geokeys, some of the geokeys may be inside the ring we're looking at, and some may not; we can't ignore an object just based on one geokey not being inside of it. The polygons in your example had this property.

Comment by hari.khalsa@10gen.com [ 29/Apr/13 ]

Thanks for the helpful details. I'll take a look soon.

Comment by Jim Dagg [ 26/Apr/13 ]

Attached script to run the queries for the $near radius and 25-point polygon approximation and display the differences.

Comment by Jim Dagg [ 26/Apr/13 ]

Added a ZIP of the data missing from each of a $near search and a $geoIntersects search, given a 25-, 50-, 75-, and 100-km radius. Blue elements are those represented in the $geoIntersects query for a 25-point polygon approximation of the radius provided to $near, but not in the result set of the $near query. The red elements are the reverse. (The red elements generally appear on the edge of the border, which is expected.)

Also note that all of the missing elements in the $near queries are polygon objects.

Comment by Jim Dagg [ 26/Apr/13 ]

Added KML files showing query results. near.kml is the result of a $near search. geoIntersects.kml is the result of a $geoIntersects search which approximates the radius provided to $near with a 25-point polygon.

Generated at Thu Feb 08 03:20:32 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.