[SERVER-1392] Strange geolocation query slowness Created: 10/Jul/10 Updated: 12/Jul/16 Resolved: 18/Oct/10 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Geo |
| Affects Version/s: | 1.5.4 |
| Fix Version/s: | 1.6.4, 1.7.2 |
| Type: | Bug | Priority: | Minor - P4 |
| Reporter: | Nolan Darilek | Assignee: | Mathias Stearn |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Linux |
||
| Attachments: |
|
| Operating System: | Linux |
| Participants: |
| Description |
|
I'm noticing huge disparities between some geolocation queries. The queries don't seem all that different, yet there is a huge difference in time taken to complete them--as in, some reliably finish in under a second, while others reliably take seconds or more. See, for instance, this session: MongoDB shell version: 1.5.5-pre- connecting to: hermes > db.nodes.find({ loc: { $within: { $box: [ { lat: 30.29714, lon: -97.73608 }<ithin: { $box: [ { lat: 30.29714, lon: -97.73608 }, { lat: 30.29914, lon: -9 <.29714, lon: -97.73608 }, { lat: 30.29914, lon: -97 .73408000000001 }] } } } < { lat: 30.29914, lon: -97.73408000000001 }] } } }) .explain() { "cursor" : "GeoBrowse-box", "nscanned" : 5, "nscannedObjects" : 5, "n" : 5, "millis" : 5021, "indexBounds" : [ ] }> > > > db.nodes.find({ loc: { $within: { $box: [ { lat: 30.29664, lon: -97.7364 }, , { lat: 30.29864, lon: -97 <.29664, lon: -97.7364 }, { lat: 30.29864, lon: -97. 73439999999999 }] } } }) < lat: 30.29864, lon: -97.73439999999999 } ] } } }). explain() { "cursor" : "GeoBrowse-box", "nscanned" : 6, "nscannedObjects" : 6, "n" : 6, "millis" : 562, "indexBounds" : [ ] }> > It appears that the second query scans and returns more objects, yet it takes a tenth the amount of time. Note that the first query takes ten times longer to return, but this is actually short. I've had queries that took upwards of 30 seconds to return similar results. None of these areas are sparsely populated with points, so I don't think that the issue is that the query distances need to be limited. You can find my dataset here: http://dl.dropbox.com/u/147071/dump.tar.bz2 It's 293M, and I apologize for that, but I don't know what specific aspects of my data may be causing this disparity in search times. Dropbox says that the file will finish uploading in an hour or so. Also, because the above transcript didn't paste in such that the queries are cut-and-paste-friendly, here are the two that I am trying, slow query first: db.nodes.find({ loc: { $within: { $box: [ { lat: 30.29714, lon: -97.73608 }, { lat: 30.29914, lon: -97.73408000000001 }] } } }).explain() db.nodes.find({ loc: { $within: { $box: [ { lat: 30.29664, lon: -97.7364 }, { lat: 30.29864, lon: -97.73439999999999 }] } } }).explain() |
| Comments |
| Comment by Jacques Crocker [ 21/Oct/10 ] |
|
Working pretty awesome. Thanks for the backport! |
| Comment by Mathias Stearn [ 20/Oct/10 ] |
|
Backport done |
| Comment by auto [ 20/Oct/10 ] |
|
Author: {'login': 'RedBeard0531', 'name': 'Mathias Stearn', 'email': 'mathias@10gen.com'}Message: Faster $box queries |
| Comment by Jacques Crocker [ 20/Oct/10 ] |
|
This patch seems to apply cleanly to 1.6 branch. Any chance of a backport for a 1.6.x release? |
| Comment by auto [ 18/Oct/10 ] |
|
Author: {'login': 'RedBeard0531', 'name': 'Mathias Stearn', 'email': 'mathias@10gen.com'}Message: Faster $box queries |
| Comment by Richard Kreuter (Inactive) [ 16/Jul/10 ] |
|
AFAICT, there are 3 things going on here, but you're sort of running into limitations in the geospatial search algorithm we're using. First, geohashing has the property that superficially similar queries will end up examining different boxes within the coordinate space. The faster query scans the rectangle with corners (30.2948, -97.7399), (30.3003, -97.7344), while the slower is scanning the rectangle with corners (29.5312, -98.4375), (30.9375, -97.0313). Next, because geospatial search examines every point inside these boxes to see whether the points are inside the query box, if the data set is relatively dense, that makes for a lot of searching. This data set is fairly dense. Finally, it turns out that explain() reports meaningless numbers of documents visited during a search. It turns out that the fast query (the one that scans the smaller box) visits 900K documents, while the slow query (the one that scans the bigger bounding box) visits 8M documents. That explain() is misleading is a bug; I've crated a case: http://jira.mongodb.org/browse/SERVER-1429 However, the performance profile you're experiencing is mostly a property of geohashing. |
| Comment by Richard Kreuter (Inactive) [ 16/Jul/10 ] |
|
Visualization of the data set. 18.9 million points here. |