[SERVER-9484] 2dsphere Index may not return all documents for $near and geoNear Created: 26/Apr/13 Updated: 11/Jul/16 Resolved: 02/May/13 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Geo, Index Maintenance |
| Affects Version/s: | 2.4.3 |
| Fix Version/s: | 2.4.4, 2.5.0 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Jim Dagg | Assignee: | hari.khalsa@10gen.com |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Windows 7 x64 |
||
| Attachments: |
|
| Operating System: | ALL |
| Steps To Reproduce: | Import data file (attached) into MongoDB using mongoImport to database "sparcTest" and collection "TirCSV". Ensure 2dsphere index on "geometry" field. |
| Participants: |
| Description |
|
When performing a $near or geoNear query on GeoJSON data indexed with a 2dsphere index, the query appears to miss data. This may be an issue with the 2dsphere index, as reindexing changes the number of data elements returned for a geoNear query, but not for a bounding-box $geoWithin query. In this case, all of the data is within the polygon bounded by longitude -90 to -80, and latitude 35 to 45. The only secondary index on the collection is {"geometry" : "2dsphere"}. > db.TirCSV.count({ "geometry" : { "$near" : { "$geometry" : { "type" : "Point", "coordinates" : [ -85.389 , 40.46]}}} , "$or" : [ { "data_type" : "tir"}]}) ) , , }} , "$or" : [ { "data_type" : "tir"}]}) |
| Comments |
| Comment by auto [ 20/May/13 ] |
|
Author: {u'date': u'2013-04-29T19:22:03Z', u'name': u'Hari Khalsa', u'email': u'hkhalsa@10gen.com'}Message: Conflicts: src/mongo/db/geo/s2nearcursor.cpp |
| Comment by auto [ 02/May/13 ] |
|
Author: {u'date': u'2013-04-29T19:22:03Z', u'name': u'Hari Khalsa', u'email': u'hkhalsa@10gen.com'}Message: |
| Comment by hari.khalsa@10gen.com [ 29/Apr/13 ] |
|
Thank you for reporting it in such a helpful way! There will be a fix in the master branch shortly and it will be backported to the next release of 2.4. |
| Comment by Jim Dagg [ 29/Apr/13 ] |
|
Thanks, Hari. I appreciate the help, and I'm glad we managed to figure out what was wrong. Looking forward to the fix! |
| Comment by hari.khalsa@10gen.com [ 29/Apr/13 ] |
|
Hello! You found a bug. Congratulations and my apologies. Let me tell you a bit about how the bug happened and how I will fix it. When you add something to a 2dsphere index, MongoDB takes the geometry information and creates "geokeys" from it. A point has one geokey, and a polygon (or linestring) may have several since it's larger. The geokey, aside from being more compact than GeoJSON, also provides rough information about the location of the geometry. Near searches on a 2dsphere index work by looking at concentric rings around the start point. You find everything with a certain R0 first, then everything between R0,R1, then everything between R1,R2, etc., where R0 < R1 < R2... When MongoDB was looking at a certain "ring," it was trying to see if the geokey was inside the ring. If the geokey it was looking at wasn't inside the ring, it decided the whole object wasn't inside the ring and ignored it. The bug is: for objects that generate several geokeys, some of the geokeys may be inside the ring we're looking at, and some may not; we can't ignore an object just based on one geokey not being inside of it. The polygons in your example had this property. |
| Comment by hari.khalsa@10gen.com [ 29/Apr/13 ] |
|
Thanks for the helpful details. I'll take a look soon. |
| Comment by Jim Dagg [ 26/Apr/13 ] |
|
Attached script to run the queries for the $near radius and 25-point polygon approximation and display the differences. |
| Comment by Jim Dagg [ 26/Apr/13 ] |
|
Added a ZIP of the data missing from each of a $near search and a $geoIntersects search, given a 25-, 50-, 75-, and 100-km radius. Blue elements are those represented in the $geoIntersects query for a 25-point polygon approximation of the radius provided to $near, but not in the result set of the $near query. The red elements are the reverse. (The red elements generally appear on the edge of the border, which is expected.) Also note that all of the missing elements in the $near queries are polygon objects. |
| Comment by Jim Dagg [ 26/Apr/13 ] |
|
Added KML files showing query results. near.kml is the result of a $near search. geoIntersects.kml is the result of a $geoIntersects search which approximates the radius provided to $near with a 25-point polygon. |