[SERVER-13568] Near search using find() with 2DSphere index is very slow vs. using a 2D index Created: 13/Apr/14 Updated: 09/Jul/16 Resolved: 17/Sep/15 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Geo |
| Affects Version/s: | 2.4.6 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Abraham Lopez | Assignee: | Unassigned |
| Resolution: | Duplicate | Votes: | 4 |
| Labels: | geoNear | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Rackspace Performance cloud server with 4 GB RAM, 2 vCPUs and high-speed SSD data disk. |
||
| Issue Links: |
|
||||||||||||||||
| Operating System: | Linux | ||||||||||||||||
| Steps To Reproduce: | see below. |
||||||||||||||||
| Participants: | |||||||||||||||||
| Description |
|
I've a database with over 3 million documents. When running a find() on a GeoJSON Point field with a 2DSphere index the query is very slow (12,000 ms), while running the same find() using a 2D index is very fast (under 1 ms). Steps to reproduce: 1. Create a collection named "objects" with more than 1 million documents. 2. Use the following simple schema:
3. Create a 2DSphere index in the location field: ); 4. Create a 2D index in the location.coordinates field:objects.ensureIndex( {"location.coordinates": "2d"}); 5. Run this 2DSphere search query on the mongo client:
You'll notice a high number of scanned objects and a very high response
6. Run this 2D search query on the mongo client:
Now you'll get a very fast response time. Here's the output I get from explain:
7. You can use geoNear instead of near when searching on the 2DSphere-indexed field and you'll get the same huge response time. |
| Comments |
| Comment by Siyuan Zhou [ 17/Sep/15 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
This issue has been fixed by Thanks, | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Abraham Lopez [ 26/Jul/15 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Awesome, will install and try the new version as soon as I've the chance (currently very busy to do this) and report back the results. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Siyuan Zhou [ 24/Jul/15 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hi aplimovil, We introduced several geo performance improvements in the latest development release 3.1.6. From the symptoms, I feel like this issue should be fixed by the recent changes. I would appreciate it if you could give it a try in your testing environment. 3.1.6 can be found on the download page. 3.1.6 introduces version 3 of 2dsphere index while old index versions are still supported, reindex of 2dsphere index is necessary to get the most of the benefits. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Siyuan Zhou [ 25/Sep/14 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
If the stored data is GeoJSON point, then | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Abraham Lopez [ 25/Sep/14 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Unfortunately the data I used is of private ownership by a client of mine, so I cannot share it. So the | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Siyuan Zhou [ 24/Sep/14 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
aplimovil, thanks for your update. I believe Thanks, | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Abraham Lopez [ 24/Sep/14 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Thanks Siyuan. I've added a comment to that ticket so you guys remember to let us know through this ticket when it's tackled, so we can test and confirm the improvement fixes this performance issue originally reported. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Siyuan Zhou [ 23/Sep/14 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
heasleyb, thanks a lot for your sample dataset. I am able to reproduce the slow queries with $geoIntersects. It turns out that the parsing of a geometry takes most of the time, which includes the self-intersection test of polygon among other sanity checks and validations. This issue has been filed in This ticket is for the performance of $near/$nearSphere, if you have any performance issue with near search, feel free to update this ticket. Thanks again! | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Brian Heasley [ 12/Sep/14 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
We are using MongoDB v2.6.3. We're using the Maponics school data set (http://www.maponics.com/products/gis-map-data/school-boundaries/overview). I could give you the slow queries we are using but I don't believe we could provide the actual data due to licensing concerns without more discussion. If you wanted to go that route feel free to email me direct and we can see if we can work it out. I appreciate you looking into this! | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Siyuan Zhou [ 12/Sep/14 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
aplimovil, ldsenow and heasleyb, Thanks for your feedback. I am working on geo performance and looking into this problem. Could you please send us the real sample dataset and slow queries. You can attach the dataset to this ticket or just give us a link. If we are able to reproduce the exact same problem on our side, it will be very helpful for us to understand this issue and run CPU profiling. aplimovil's original comments and thomasr's reproduction is a good starting point, but we'd love to have more real data sets and queries. Besides, which version are you using? Thanks, | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Brian Heasley [ 12/Sep/14 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
We've had the same issue (very slow 2dSphere index queries) with Maponics school data, specifically in NYC where the density is high. Is there any update from the Geo engineer on whether this is a legitimate bug that might be addressed? | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by ldsenow [ 28/Jun/14 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hi Abraham, I am having exactly the same issue. Thanks to Thomas to point out the problem. I believe the performance is not acceptable and I am forced to create an old 2d index. I hope they can path it in 2.6.4 not 2.8 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Abraham Lopez [ 30/Apr/14 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hi Thomas, Yes, that's exactly my case. Our database has millions of records (locations) which have coordinates that are very close together, as our database is comprised of thousands of renders of physical objects (buildings, roofs, trees, streets, etc.) of each major city in the USA, so the points in my collection are indeed very close together, so it seems this is a bug with the 2DSphere index when points in the collection are very close to each other. As for radians, thanks for the tip, I was already aware of it | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Thomas Rueckstiess [ 30/Apr/14 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hi Abraham, I spent some time trying to reproduce the issue today. I finally saw a similar result to yours (2dsphere very slow compared to 2d index), but only under a certain data distribution. Specifically, when all the points in the collection were really close together (in my case, they were uniformly centered around [-118.50, 34.02] with +/- 0.01 in each direction). For that particular case, I got a large discrepancy in performance, 42 seconds (2dsphere) vs. 12 milliseconds (2d). The commands and outputs are below.
However, when changing the data distribution and spread the points further apart, for example increasing the window to +/- 0.5 (instead of 0.01), and repeating the queries, both run really fast and there is almost no difference between the two indexes. Does your dataset contain a large number of points that are really close to each other? Another thing I want to point out: The $maxDistance value is treated differently for the different indexes / coordinate formats. For the 2d index and the legacy coordinate pair, the $maxDistance is measured in radians, not meters. So you'd have to change that value for your tests. However, I adjusted for that, and even removed it entirely, and still saw the discrepancy in running time. I'm going to follow up with one of our Geo engineers to find out the reason for this behavior and. In the mean time, if you can share any extra information about your data set or distribution, that would be very helpful. Regards, | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Abraham Lopez [ 28/Apr/14 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hi Thomas, Unfortunately, that's not the real cause nor the solution. If you remove the explain() from the queries I included in my examples, you'll notice the MongoDB console takes the exact amount of time to respond that explain() is reporting, so the queries are indeed running slowly. Also, even if you run them with the explain() you'll find they are still as slow as without it. I actually found out this issue when I was developing a Node.js API that connected to MongoDB (using the native driver with MongoJS, no Mongoose) and found that the API was ridiculously slow to respond. So, I ran the query directly in the MongoDB console and confirmed MongoDB was the bottleneck. Can you try running the queries I mentioned but without the explain() so you can see what I mean? This is a very weird issue, which I believe should be fixed, as I'm having to use 2D indexes rather than the 2DSphere ones. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Thomas Rueckstiess [ 25/Apr/14 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hi Abraham, In 2.4.x, the explain output of GeoSearchCursors was not returning correct results, see
You should see that it had to scan the same large number of matches as your 2dsphere query, and that it took longer than 0ms. You can also wrap the query for the 2d index in
which will measure and print out the time in seconds the query took in wall time. This was a display issue and was fixed for version 2.6.0. Regards, |