[SERVER-35581] Don't mandate the use of "distanceField" in $geoNear Created: 13/Jun/18  Updated: 03/Feb/24

Status: Backlog
Project: Core Server
Component/s: Aggregation Framework
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Minor - P4
Reporter: Kyle Suarez Assignee: Backlog - Query Integration
Resolution: Unresolved Votes: 0
Labels: neweng, qi-geo, qi-quick-win-candidate
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
depends on SERVER-33323 Refactor $mergeCursors stage to allow... Closed
Gantt Dependency
has to be done after SERVER-35043 Remove geoNear command Closed
Related
related to SERVER-58443 Allow $near/$nearSphere on a view Backlog
Assigned Teams:
Query Integration
Backwards Compatibility: Minor Change
Participants:

 Description   

The $geoNear stage requires that "distanceField" is specified. Presumably, this was required to allow $geoNear to work in a sharded cluster, as mongos must take the results from each shard and merge them in ascending order of distance. However, users that aren't interested in the actual distance will have to project out the field in their results.

When SERVER-35043 is complete, DocumentSourceGeoNearCursor could ensure that the sort key is set to be the computed distance without requiring that "distanceField" is actually added to the output.



 Comments   
Comment by David Percy [ 12/Oct/21 ]

I attempted this as part of SERVER-58443.  (I had to revert that commit for performance reasons, but not because of distanceField.)

It looked like we are already setting some metadata (the sort key or maybe the geo distance), and mongos is using that to merge the results.

Comment by Kyle Suarez [ 26/Jul/18 ]

Yes, you're exactly correct. The special $sortKey field tells us the value that we should be sorting on, such that a sharded merge can merge sorted streams. Therefore, we don't need the "distanceField" at all. In a post-SERVER-33323 world, Charlie will change the cluster aggregation merging logic such that we only need to indicate the direction of the sort: either ascending or descending on the sort key. That will eliminate that DocumentSourceSort once and for all.

Comment by Lenny Khazan [ 25/Jul/18 ]

Thanks to you both for the info and feedback. I was under the impression that this chunk of code sets some metafield by which results can be merged across shards — does it serve some other purpose?

Comment by Kyle Suarez [ 25/Jul/18 ]

asya has reminded me that, in a sharded cluster, we must ensure that the merging shard correctly merges documents produced by $geoNear in ascending order of distance (even if the distance is not present in the document). That logic is encapsulated in DocumentSourceGeoNear::getMergeSources()

std::list<boost::intrusive_ptr<DocumentSource>> DocumentSourceGeoNear::getMergeSources() {
    return {DocumentSourceSort::create(
        pExpCtx, BSON(distanceField->fullPath() << 1 << "$mergePresorted" << true))};
}

However, in SERVER-33323, charlie.swanson is working on a different way to merge aggregations in a sharded cluster that will obviate the need for a DocumentSourceSort.

I've marked this ticket as depending on SERVER-33323. That work is in code review now and I anticipate it landing soon. I'd suggest holding off attempting this ticket until that one is committed, especially because I anticipate many confusing merge conflicts.

Cheers,
Kyle

Comment by Kyle Suarez [ 25/Jul/18 ]

Hi lennykhazan,

The intent behind DocumentSource::getOutputSorts() was to expose sort orders to the aggregation planner so that we could take advantage of sorted sequences. However, the planner currently does not use it at all – work to track that is in SERVER-22966, and right now we have no plans to schedule it in the near future.

As part of this ticket, it would be perfectly fine to simply delete DocumentSourceGeoNearCursor::getOutputSorts(); it won't affect the correctness of the system.

Thanks again for offering to help!
Kyle

Comment by Asya Kamsky [ 25/Jul/18 ]

lennykhazan 

Thank you for your offer to help.  In order for us to accept a pull request we need a signed contributor agreement. You can see other contibutor guidelines in our Wiki.

kyle.suarez will help you with your question about handling the distance field.

Feel free to join the MongoDB Developer Google Group if you also have more general question about development of MongoDB server.

Asya

Comment by Lenny Khazan [ 22/Jul/18 ]

I can take a crack at this. Making distanceField optional seems straightforward enough, but it's not immediately clear to me how
DocumentSourceGeoNearCursor::getOutputSorts is to be implemented if we are not necessarily saving the distance as a field on the output docs anymore — any help would be appreciated!

Generated at Thu Feb 08 04:40:17 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.