[SERVER-1392] Strange geolocation query slowness Created: 10/Jul/10  Updated: 12/Jul/16  Resolved: 18/Oct/10

Status: Closed
Project: Core Server
Component/s: Geo
Affects Version/s: 1.5.4
Fix Version/s: 1.6.4, 1.7.2

Type: Bug Priority: Minor - P4
Reporter: Nolan Darilek Assignee: Mathias Stearn
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Linux


Attachments: PNG File server-1392.png    
Operating System: Linux
Participants:

 Description   

I'm noticing huge disparities between some geolocation queries. The queries don't seem all that different, yet there is a huge difference in time taken to complete them--as in, some reliably finish in under a second, while others reliably take seconds or more. See, for instance, this session:

MongoDB shell version: 1.5.5-pre-

connecting to: hermes

> db.nodes.find({ loc: { $within: { $box: [

{ lat: 30.29714, lon: -97.73608 }

<ithin: { $box: [

{ lat: 30.29714, lon: -97.73608 }

,

{ lat: 30.29914, lon: -9 <.29714, lon: -97.73608 }

,

{ lat: 30.29914, lon: -97 .73408000000001 }

] } } }

<

{ lat: 30.29914, lon: -97.73408000000001 }

] } } }) .explain()

{ "cursor" : "GeoBrowse-box", "nscanned" : 5, "nscannedObjects" : 5, "n" : 5, "millis" : 5021, "indexBounds" : [ ] }

>

>
<29914, lon: -97.73408000000001 } ] } } }).explain()

{ "cursor" : "GeoBrowse-box", "nscanned" : 5, "nscannedObjects" : 5, "n" : 5, "millis" : 5434, "indexBounds" : [ ] }

>
<29914, lon: -97.73408000000001 } ] } } }).explain()

{ "cursor" : "GeoBrowse-box", "nscanned" : 5, "nscannedObjects" : 5, "n" : 5, "millis" : 5516, "indexBounds" : [ ] }

> db.nodes.find({ loc: { $within: { $box: [

{ lat: 30.29664, lon: -97.7364 }

,
<ithin: { $box: [

{ lat: 30.29664, lon: -97.7364 }

,

{ lat: 30.29864, lon: -97 <.29664, lon: -97.7364 }

,

{ lat: 30.29864, lon: -97. 73439999999999 }

] } } })

< lat: 30.29864, lon: -97.73439999999999 } ] } } }). explain()

{ "cursor" : "GeoBrowse-box", "nscanned" : 6, "nscannedObjects" : 6, "n" : 6, "millis" : 562, "indexBounds" : [ ] }

>
<29864, lon: -97.73439999999999 } ] } } }).explain()

{ "cursor" : "GeoBrowse-box", "nscanned" : 6, "nscannedObjects" : 6, "n" : 6, "millis" : 559, "indexBounds" : [ ] }

>
<29864, lon: -97.73439999999999 } ] } } }).explain()

{ "cursor" : "GeoBrowse-box", "nscanned" : 6, "nscannedObjects" : 6, "n" : 6, "millis" : 563, "indexBounds" : [ ] }

It appears that the second query scans and returns more objects, yet it takes a tenth the amount of time. Note that the first query takes ten times longer to return, but this is actually short. I've had queries that took upwards of 30 seconds to return similar results.

None of these areas are sparsely populated with points, so I don't think that the issue is that the query distances need to be limited.

You can find my dataset here: http://dl.dropbox.com/u/147071/dump.tar.bz2 It's 293M, and I apologize for that, but I don't know what specific aspects of my data may be causing this disparity in search times. Dropbox says that the file will finish uploading in an hour or so.

Also, because the above transcript didn't paste in such that the queries are cut-and-paste-friendly, here are the two that I am trying, slow query first:

db.nodes.find({ loc: { $within: { $box: [

{ lat: 30.29714, lon: -97.73608 }

,

{ lat: 30.29914, lon: -97.73408000000001 }

] } } }).explain()

db.nodes.find({ loc: { $within: { $box: [

{ lat: 30.29664, lon: -97.7364 }

,

{ lat: 30.29864, lon: -97.73439999999999 }

] } } }).explain()



 Comments   
Comment by Jacques Crocker [ 21/Oct/10 ]

Working pretty awesome. Thanks for the backport!

Comment by Mathias Stearn [ 20/Oct/10 ]

Backport done

Comment by auto [ 20/Oct/10 ]

Author:

{'login': 'RedBeard0531', 'name': 'Mathias Stearn', 'email': 'mathias@10gen.com'}

Message: Faster $box queries SERVER-1392
http://github.com/mongodb/mongo/commit/8d79226466632ca8a57a3b140cf3b0f0f2e60f19

Comment by Jacques Crocker [ 20/Oct/10 ]

This patch seems to apply cleanly to 1.6 branch. Any chance of a backport for a 1.6.x release?

Comment by auto [ 18/Oct/10 ]

Author:

{'login': 'RedBeard0531', 'name': 'Mathias Stearn', 'email': 'mathias@10gen.com'}

Message: Faster $box queries SERVER-1392
http://github.com/mongodb/mongo/commit/b902ee404623407ba3eca55954882afbe389f9aa

Comment by Richard Kreuter (Inactive) [ 16/Jul/10 ]

AFAICT, there are 3 things going on here, but you're sort of running into limitations in the geospatial search algorithm we're using.

First, geohashing has the property that superficially similar queries will end up examining different boxes within the coordinate space. The faster query scans the rectangle with corners (30.2948, -97.7399), (30.3003, -97.7344), while the slower is scanning the rectangle with corners (29.5312, -98.4375), (30.9375, -97.0313).

Next, because geospatial search examines every point inside these boxes to see whether the points are inside the query box, if the data set is relatively dense, that makes for a lot of searching. This data set is fairly dense.

Finally, it turns out that explain() reports meaningless numbers of documents visited during a search. It turns out that the fast query (the one that scans the smaller box) visits 900K documents, while the slow query (the one that scans the bigger bounding box) visits 8M documents. That explain() is misleading is a bug; I've crated a case:

http://jira.mongodb.org/browse/SERVER-1429

However, the performance profile you're experiencing is mostly a property of geohashing.

Comment by Richard Kreuter (Inactive) [ 16/Jul/10 ]

Visualization of the data set. 18.9 million points here.

Generated at Thu Feb 08 02:56:53 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.