[SERVER-17851] $geoWithin and $geoIntersects do not match the query polygon's border Created: 01/Apr/15  Updated: 28/Dec/23

Status: Backlog
Project: Core Server
Component/s: Geo
Affects Version/s: 2.6.8
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Alex Olieman Assignee: Backlog - Query Integration
Resolution: Unresolved Votes: 0
Labels: qi-geo, query-44-grooming
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
is duplicated by SERVER-18186 $geoWithin edge problems Closed
Related
related to SERVER-5915 Polygon::contains() can return wrong ... Backlog
related to SERVER-13599 $geoIntersects does not return polygo... Backlog
related to DOCS-5193 Degeneracy of geometry relationship w... Closed
Assigned Teams:
Query Integration
Operating System: ALL
Steps To Reproduce:

1. insert linestring into test collection

db.geo_test.insert({
    "geometry": {
        "type": "LineString",
        "coordinates": [[2.0, 0.0], [1.0, 0.0]]
    }
})

2. create a 2dsphere index

db.geo_test.ensureIndex({'geometry': '2dsphere'})

3. query the index with a polygon

db.geo_test.find({
   'geometry': {
      $geoWithin: {
         $geometry: {
            'type': "Polygon",
            'coordinates': [[[1.0, 1.0], [2.0, 1.0], [2.0, 0.0], [1.0, 0.0], [1.0, 1.0]]]
         }
      }
   }
})

Participants:

 Description   

The $geoWithin and $geoIntersects operators do not retrieve (parts of) the border of the polygon that they are queried with. This doesn't make sense, and the docs seem to agree:

When determining inclusion, MongoDB considers the border of a shape to be part of the shape, subject to the precision of floating point numbers.

This issue presents itself when all coordinate pairs of a linestring intersect (i.e. are part of) the polygon border. When one of the line's points on the border is moved to the inside of the polygon, that line is retrieved by the query as expected.

I would be surprised if this issue is caused by floating point precision or the spherical reference system.



 Comments   
Comment by Alex Olieman [ 10/Apr/15 ]

Hi siyuan.zhou@10gen.com, thank you for your explanation of how this behavior originates from S2. It is correct that the behavior I'm asking for corresponds to DE-9IM "covers" rather than "contains". In that sense, "contains" seems to be working correctly in MongoDB, but the documentation erroneously explains it as the "covers" behavior.

As I understand, there is a fundamental problem with supporting intersection and containment of arbitrary edge sections with a polygon. It would probably involve some effort in adapting S2, which would be a bad investment considering that S2 hasn't been maintained since 2011. It seems, however, that the majority of real-world "shared edge" cases involve what you call exact edges. The main use case I'm referring to is to compute sensible unions of polygons, which is a common task in spatial computation (Davis, 2007). If I were using S2 directly, I would use the s2polygonbuilder for this task. But regardless of how any MongoDB user chooses to compute such unions, we need to be able to retrieve sensible "merge candidates" using an index. For the general case it would be sufficient to implement "intersects" as DE-9IM defines it, although more specific cases such as mine would benefit from the "covers" predicate.

To illustrate this with an OpenStreetMap example: the goal is to merge polygons A, B, and C, but to exclude polygons D and E from the union. We are looking for merge candidates with A, because it has a smaller ObjectID and we have no prior information on which polygons to merge. A query that complies with DE-9IM intersects() retrieves A, B, C, D, and E. This is a good start, but leaves further narrowing down up to the user. If we were also able to query with a $geoCovers operator, but which only matches on exact shared edges, we would find that A only shares an edge with B, and subsequently that B shares an edge with C. In this sense, matching only polygons that share an exact edge could be a feature, rather than a workaround.

Comment by Siyuan Zhou [ 09/Apr/15 ]

Hi AliOli, thanks a lot for your comment. Unfortunately, current implementation cannot guarantee a polygon contains or intersects its edge. We may cover the special case where the exact edge intersects with / is contained by the polygon, but it's hard to guarantee a part of edge has the same behavior. Geometry relationship testing is based on S2 library, which treats degenerate in a special way as explained in SERVER-13599. To support this desired behavior, we have to make some fundamental changes. Technically, instead of contains(), covers() is a better description of this behavior, according to DE-9IM.

Cleaning dataset is an interesting use case, but we still need to gather more user feedback to justify the design. I really appreciate your pointing out this issue. We'll update the documentation to clear up the confusion.

Comment by Alex Olieman [ 02/Apr/15 ]

Hi siyuan.zhou@10gen.com, of course: I'm working with an OpenStreetMap dataset which incorporates geographical features at a very detailed level. My use case is location detection in text, and visualizing the mentioned locations for a user. To do that, the level of detail in OSM is far to high. Most streets, for instance, consist of many segments (i.e. Ways in OSM), which have the same name but are not explicitly linked in the source data. To work with this data, it needs to be simplified (i.e. aggregated and abstracted where possible).

The $geoWithin operator can thus be used to check if a geographical feature is contained within another geographical feature. If that is the case, the larger geometry can be kept, and its constituent parts can be thrown out. Herald Square, for example, is found as two geographical features in OSM: the entire square, and the park area that it contains. For any application where such level of detail is not required, it would be wonderful if we could use MongoDB to find all geometry that is contained within the square, and remove it from the collection. In some countries it is very common to have a street surrounding the square with the same name as the square. The issue I've reported here prevents me from simplifying this kind of geometry without a workaround.

And in general: recognizing parts of a geometry as being "within" that geometry would be good for logical consistency. In the current situation, a polygon is retrieved in response to a $geoWithin query with its own geometry, but its borders are not. So, a polygon is within itself, but its edges are not? If there is a severe technical reason for this behavior to stay the same, I think it deserves a good explanation in the docs.

Comment by Siyuan Zhou [ 02/Apr/15 ]

Hi AliOli, 2dsphere index still uses geohashing and b-tree, however, this issue is more relevant to how MongoDB deals with degenerate cases rather than indexing. We're still investigating this issue. Could you tell us more about your use case where a polygon should contain its edges?

Comment by Alex Olieman [ 02/Apr/15 ]

Hi ramon.fernandez, thanks for confirming the bug.
I was able to use subsequence testing as a naive workaround, i.e. testing if the linestring (or its reverse) is a subsequence of the polygon coordinate list (or any of its parts). But that only works well for boolean testing of course, not so much for querying a large collection.

Does the 2dsphere index v2 still use geohashes in a b-tree? Or have you moved on to r-trees or similar to index the geometries? I'm willing to think along, but would need to know some more context.

Cheers,
Alex

Comment by Ramon Fernandez Marina [ 02/Apr/15 ]

Thanks for your report AliOli, we can reproduce the behavior you describe and are looking into it.

This is a very specific corner case, but before we jump into a naive fix we need to investigate the issue in detail and make sure we don't create assorted other bugs. Stay tuned.

Cheers,
Ramón.

Generated at Thu Feb 08 03:45:47 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.