[SERVER-1982] fully support sharding on geo field Created: 20/Oct/10 Updated: 28/Dec/23 |
|
| Status: | Backlog |
| Project: | Core Server |
| Component/s: | Geo, Sharding |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | New Feature | Priority: | Major - P3 |
| Reporter: | Mathias Stearn | Assignee: | Backlog - Query Integration |
| Resolution: | Unresolved | Votes: | 53 |
| Labels: | qi-geo | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||
| Assigned Teams: |
Query Integration
|
||||||||||||
| Participants: | |||||||||||||
| Case: | (copied to CRM) | ||||||||||||
| Description |
|
Currently "works" by routing geo queries to all shards. Correct results, but inefficient. |
| Comments |
| Comment by Xiaochen Wu [ 15/Dec/22 ] |
|
linked to INIT-5 and send it back to QE backlog. kateryna.kamenieva@mongodb.com kyle.suarez@mongodb.com please review. |
| Comment by Michael Kalish [ 28/Jul/14 ] |
|
Any update on when this might make it in? Seems like this would be vital for operating with multiple, geographically distant data centers. |
| Comment by Ron Waldon [ 20/Mar/13 ] |
|
Here's a scenario:
I realise you could accomplish this with separate MongoDB clusters, but then you have to maintain application-level logic to know which configurations to use and when. But this is a slippery slope: after all, I can use application-level logic to get sharding to work with MySQL. To me, the philosophy of MongoDB is to let the DB make intelligent choices about data storage and retrieval. This seems like an entirely reasonable request to increase the ability for MongoDB to make intelligent choices. |
| Comment by Scott Lowe [ 30/Mar/12 ] |
|
I was exited to see the geo indexing capabilities of mongo, but was dissapointed that this has not been properly implemented with regards to sharding. IMO this makes geo data a second class citizen in mongo. Are there plans for this to be implemented? |
| Comment by Leon van Zantvoort [ 01/Mar/12 ] |
|
@Eliot I don't have any performance issues at this moment. I was just checking the (theoretical) scalability properties of geo/sharding in case I do need to scale. Thanks for the response anyway! |
| Comment by Eliot Horowitz (Inactive) [ 01/Mar/12 ] |
|
Yes - but its unclear if that's good or bad in this case. |
| Comment by Leon van Zantvoort [ 01/Mar/12 ] |
|
You describe the benefits of sharding. I totally agree with you on those positive effects. In general, my statement boils down to: "Sharding on key 'A' and querying on (solely) key 'B' is not efficient. It is more efficient compared to a non-partitioned approach for the reasons you describe above, but fact remains that all shards are involved in a search if the shard key is not part of the query." As long as sharding on geo key is not fully supported, it looks like Mongo behaves like this. |
| Comment by Mathias Stearn [ 29/Feb/12 ] |
|
To be clear, you went from 30% to 20% load. Also, by spreading the data out you are able to keep more data in ram which is the #1 determiner of performance. Since having all data in ram is between 1000 (high-end SSD) to 1,000,000 (spinning disk) times faster so is much more important when it comes to scalability than reducing the processing that each server has to do. |
| Comment by Leon van Zantvoort [ 29/Feb/12 ] |
|
Again, taking O(log n) as an example. 1000 documents, 1 shard would be: 1000 documents equally distribibuted of 10 shards: I know there is more to it, but this is the point I'm trying to make. |
| Comment by Leon van Zantvoort [ 29/Feb/12 ] |
|
First of all, thanks for being responsive. This is appreciated! I'm not sure what the time complexity is of geo spatial queries, but for sake of simplicity say it's O(log n). If I apply this to your suggestion, we would get the following: Geo key is used for sharding: O(log n) This means that the more shards I add, the quicker each individual shard will find results (performance gain is sublinear), however this needs to be multiplied by the number of shards. So from a response time perspective (single user system, weakest link not taken into account), you are right about performance improvement using multiple shards. But from a scalability perspective, this approach will add more and more load on the system as a whole if more shards are added. |
| Comment by Eliot Horowitz (Inactive) [ 29/Feb/12 ] |
|
Sort of. If one shard is handling it, then it may have to look at 1000 documents. Pros and cons, but I would test sharding by a different key as well. |
| Comment by Leon van Zantvoort [ 29/Feb/12 ] |
|
Hi Eliot, I would like to use the geo key for sharding, so I can lookup documents near a given location efficiently (and in a scalable way). If I understand correctly, if sharding for geo keys would be fully supported, only a single shard is accessed for getting documents near a given location (maybe two shards in case the given location is close to a range boundary). For that reason, it would be a waste of resources, if other shards are accessed, as they don't have any documents near this location. Thanks, |
| Comment by Eliot Horowitz (Inactive) [ 29/Feb/12 ] |
|
A call to all shards scales pretty well, as each shard is doing 1/N the work. |
| Comment by Ignacio Lago [ 29/Feb/12 ] |
|
And it's critical since it is the common case scenario for a MongoDB setup. As an example: Foursquare uses MongoDB mainly because Sharding and Geospatial Index. (#ref http://www.10gen.com/customers/foursquare) It's ironic that both features aren't compatible yet. I mean: efficient and useful, not a call to all shards. |
| Comment by Leon van Zantvoort [ 29/Feb/12 ] |
|
Hi Mathias, Since this case is created over 16 months ago "Planning Bucket A" is not telling me if this will be picked up within the coming 16 months or later... Leon |
| Comment by Mathias Stearn [ 29/Feb/12 ] |
|
That is what this case is for. |
| Comment by Leon van Zantvoort [ 28/Feb/12 ] |
|
(Sorry for duplicating this comment, but I think this issue is better suited for it...) Are there any plans to support sharding on geo key? As long as this isn't supported, geospatial searching doesn't scale with Mongo, or am I missing something here? |