[SERVER-29618] $geoWithin in aggregation pipeline after $lookup and $unwind returns incorrect results Created: 14/Jun/17  Updated: 30/Oct/23  Resolved: 27/Jun/17

Status: Closed
Project: Core Server
Component/s: Aggregation Framework
Affects Version/s: 3.4.4
Fix Version/s: 3.4.6, 3.5.10

Type: Bug Priority: Critical - P2
Reporter: Markus Penttilä Assignee: Kyle Suarez
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Related
is related to SERVER-14096 Query Planner Explain for Find Closed
is related to SERVER-21612 Combine post $lookup $match on looked... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v3.4
Steps To Reproduce:

Steps to reproduce within mongo shell:

db.createCollection("items")
 
db.createCollection("locations")
 
db.locations.insert({"coordinates":[25.266, 60.36938],"id":42})
 
db.items.insert({"id":1, "location_id":42})
 
db.items.aggregate([{$match:{id: 1}}, {$lookup:{from:"locations",localField:"location_id",foreignField:"id",as:"location"}},{$unwind:"$location"},
{$match:{"location.coordinates":{$geoWithin:{"$geometry":{"type":"MultiPolygon","coordinates":[[[[20.0,70.0],[30.0,70.0],[30.0,50.0],[20.0,50.0],[20.0,70.0]]]]}}}}}])

This returns nothing in MongoDB 3.4.4. The expected result is (as in 3.2.13):

{ "_id" : ObjectId("594122edb3d185a5e6770c86"), "id" : 1, "location_id" : 42, "location" : { "_id" : ObjectId("594122bdb3d185a5e6770c85"), "coordinates" : [ 25.266, 60.36938 ], "id" : 42 } }

Sprint: Query 2017-07-10
Participants:

 Description   

Using $geoWithin in a $match stage after $lookup and $unwind in an aggregation pipeline fails to find results that should be found within the geometry. This works properly in 3.2.13.



 Comments   
Comment by Githook User [ 27/Jun/17 ]

Author:

{u'username': u'ksuarz', u'name': u'Kyle Suarez', u'email': u'kyle.suarez@mongodb.com'}

Message: SERVER-29618 fix serialization of geo match expressions

Explicitly blacklist lookup_absorb_match from aggregation_sharded_collections_passthrough.

(cherry picked from commit 18bc61d22123da5897d275eb92576522a1bab4de)
Branch: v3.4
https://github.com/mongodb/mongo/commit/c55eb86ef46ee7aede3b1e2a5d184a7df4bfb5b5

Comment by Githook User [ 27/Jun/17 ]

Author:

{u'username': u'ksuarz', u'name': u'Kyle Suarez', u'email': u'kyle.suarez@mongodb.com'}

Message: SERVER-29618 fix serialization of geo match expressions
Branch: master
https://github.com/mongodb/mongo/commit/18bc61d22123da5897d275eb92576522a1bab4de

Comment by Kyle Suarez [ 22/Jun/17 ]

After some digging, the _rawObj held by the GeoMatchExpression isn't the actual raw object passed in by the user. It's actually a "cleaned-up" BSONObj that stores the true raw specification as a subobject of the original path, which happens at parse time. This seems really dubious to me, because if setPath() is ever called, _rawObj will never see the changes.

Perhaps it would be possible to keep the path and the raw spec separate, and combine them only when needed by the index layer?

Comment by Charlie Swanson [ 14/Jun/17 ]

I have confirmed that this was introduced by SERVER-21612, it works on 3.5.5 and not 3.5.6.

It looks like the issue is that we don't correctly descend the path on the $geoWithin predicate. There is code to convert a match like {"asPath.x": {$eq: 4}} into just {x: {$eq: 4}} so it can be absorbed into the foreign query, but it looks like it doesn't properly handle the $geoWithin predicate. You can see that when I explain a normal equality predicate on a sub-path of 'location', the $lookup stage in the explain output has removed the 'location' prefix of the path:

black-cherry(mongod-3.3.6) test> db.items.explain().aggregate([{$match:{id: 1}}, {$lookup:{from:"locations",localField:"location_id",foreignField:"id",as:"location"}},{$unwind:"$location"}, {$match:{"location.coordinates": 4}}])
{
  "waitedMS": NumberLong("0"),
  "stages": [
    {
      "$cursor": {
        "query": {
          "id": 1
        },
        "queryPlanner": {
          "plannerVersion": 1,
          "namespace": "test.items",
          "indexFilterSet": false,
          "parsedQuery": {
            "id": {
              "$eq": 1
            }
          },
          "winningPlan": {
            "stage": "COLLSCAN",
            "filter": {
              "id": {
                "$eq": 1
              }
            },
            "direction": "forward"
          },
          "rejectedPlans": [ ]
        }
      }
    },
    {
      "$lookup": {
        "from": "locations",
        "as": "location",
        "localField": "location_id",
        "foreignField": "id",
        "unwinding": {
          "preserveNullAndEmptyArrays": false
        },
        "matching": {
          "coordinates": {
            "$eq": 4
          }
        }
      }
    }
  ],
  "ok": 1
}

Contrast that to the explain output when I use a $geoWithin predicate:

{
  "waitedMS": NumberLong("0"),
  "stages": [
    {
      "$cursor": {
        "query": {
          "id": 1
        },
        "queryPlanner": {
          "plannerVersion": 1,
          "namespace": "test.items",
          "indexFilterSet": false,
          "parsedQuery": {
            "id": {
              "$eq": 1
            }
          },
          "winningPlan": {
            "stage": "COLLSCAN",
            "filter": {
              "id": {
                "$eq": 1
              }
            },
            "direction": "forward"
          },
          "rejectedPlans": [ ]
        }
      }
    },
    {
      "$lookup": {
        "from": "locations",
        "as": "location",
        "localField": "location_id",
        "foreignField": "id",
        "unwinding": {
          "preserveNullAndEmptyArrays": false
        },
        "matching": {
          "location.coordinates": {
            "$geoWithin": {
              "$geometry": {
                "type": "MultiPolygon",
                "coordinates": [
                  [
                    [
                      [
                        20,
                        70
                      ],
                      [
                        30,
                        70
                      ],
                      [
                        30,
                        50
                      ],
                      [
                        20,
                        50
                      ],
                      [
                        20,
                        70
                      ]
                    ]
                  ]
                ]
              }
            }
          }
        }
      }
    }
  ],
  "ok": 1
}

It looks like the problem is that the parsed GeoMatchExpression incorrectly implements serialize() as just appending the raw BSON that was input:

expression_geo.cpp(3.5.6)

372
void GeoMatchExpression::serialize(BSONObjBuilder* out) const {
373
    out->appendElements(_rawObj);
374
}

This should instead re-build the BSON object, or somehow take into account that the path might have been modified since its construction.

Comment by Charlie Swanson [ 14/Jun/17 ]

I can reproduce this. I have 3.2.11 and 3.4.4 installed, but the same problem exists:

// 3.2.11
black-cherry(mongod-3.2.11) test> db.createCollection("locations")
{
  "ok": 1
}
black-cherry(mongod-3.2.11) test> db.locations.insert({"coordinates":[25.266, 60.36938],"id":42})
Inserted 1 record(s) in 1ms
WriteResult({
  "nInserted": 1
})
black-cherry(mongod-3.2.11) test> db.items.insert({"id":1, "location_id":42})
Inserted 1 record(s) in 20ms
WriteResult({
  "nInserted": 1
})
black-cherry(mongod-3.2.11) test> db.items.aggregate([{$match:{id: 1}}, {$lookup:{from:"locations",localField:"location_id",foreignField:"id",as:"location"}},{$unwind:"$location"},
... {$match:{"location.coordinates":{$geoWithin:{"$geometry":{"type":"MultiPolygon","coordinates":[[[[20.0,70.0],[30.0,70.0],[30.0,50.0],[20.0,50.0],[20.0,70.0]]]]}}}}}])
{
  "waitedMS": NumberLong("0"),
  "cursor": {
    "id": NumberLong("0"),
    "ns": "test.items",
    "firstBatch": [
      {
        "_id": ObjectId("5941367b4e05dd2ff966b156"),
        "id": 1,
        "location_id": 42,
        "location": {
          "_id": ObjectId("5941367b4e05dd2ff966b155"),
          "coordinates": [
            25.266,
            60.36938
          ],
          "id": 42
        }
      }
    ]
  },
  "ok": 1
}
black-cherry(mongod-3.2.11) test> db.items.explain().aggregate([{$match:{id: 1}}, {$lookup:{from:"locations",localField:"location_id",foreignField:"id",as:"location"}},{$unwind:"$location"}, {$match:{"location.coordinates":{$geoWithin:{"$geometry":{"type":"MultiPolygon","coordinates":[[[[20.0,70.0],[30.0,70.0],[30.0,50.0],[20.0,50.0],[20.0,70.0]]]]}}}}}])
{
  "waitedMS": NumberLong("0"),
  "stages": [
    {
      "$cursor": {
        "query": {
          "id": 1
        },
        "queryPlanner": {
          "plannerVersion": 1,
          "namespace": "test.items",
          "indexFilterSet": false,
          "parsedQuery": {
            "id": {
              "$eq": 1
            }
          },
          "winningPlan": {
            "stage": "COLLSCAN",
            "filter": {
              "id": {
                "$eq": 1
              }
            },
            "direction": "forward"
          },
          "rejectedPlans": [ ]
        }
      }
    },
    {
      "$lookup": {
        "from": "locations",
        "as": "location",
        "localField": "location_id",
        "foreignField": "id",
        "unwinding": {
          "preserveNullAndEmptyArrays": false
        }
      }
    },
    {
      "$match": {
        "location.coordinates": {
          "$geoWithin": {
            "$geometry": {
              "type": "MultiPolygon",
              "coordinates": [
                [
                  [
                    [
                      20,
                      70
                    ],
                    [
                      30,
                      70
                    ],
                    [
                      30,
                      50
                    ],
                    [
                      20,
                      50
                    ],
                    [
                      20,
                      70
                    ]
                  ]
                ]
              ]
            }
          }
        }
      }
    }
  ],
  "ok": 1
}

// 3.4.4
black-cherry(mongod-3.4.4) test> db.createCollection("items")
{
  "ok": 1
}
black-cherry(mongod-3.4.4) test> db.createCollection("locations")
{
  "ok": 1
}
black-cherry(mongod-3.4.4) test> db.locations.insert({"coordinates":[25.266, 60.36938],"id":42})
Inserted 1 record(s) in 2ms
WriteResult({
  "nInserted": 1
})
black-cherry(mongod-3.4.4) test> db.items.insert({"id":1, "location_id":42})
Inserted 1 record(s) in 1ms
WriteResult({
  "nInserted": 1
})
black-cherry(mongod-3.4.4) test> db.items.aggregate([{$match:{id: 1}}, {$lookup:{from:"locations",localField:"location_id",foreignField:"id",as:"location"}},{$unwind:"$location"},
... {$match:{"location.coordinates":{$geoWithin:{"$geometry":{"type":"MultiPolygon","coordinates":[[[[20.0,70.0],[30.0,70.0],[30.0,50.0],[20.0,50.0],[20.0,70.0]]]]}}}}}])
{
  "cursor": {
    "id": NumberLong("0"),
    "ns": "test.items",
    "firstBatch": [ ]
  },
  "ok": 1
}
black-cherry(mongod-3.4.4) test> db.items.explain().aggregate([{$match:{id: 1}}, {$lookup:{from:"locations",localField:"location_id",foreignField:"id",as:"location"}},{$unwind:"$location"}, {$match:{"location.coordinates":{$geoWithin:{"$geometry":{"type":"MultiPolygon","coordinates":[[[[20.0,70.0],[30.0,70.0],[30.0,50.0],[20.0,50.0],[20.0,70.0]]]]}}}}}])
{
  "stages": [
    {
      "$cursor": {
        "query": {
          "id": 1
        },
        "queryPlanner": {
          "plannerVersion": 1,
          "namespace": "test.items",
          "indexFilterSet": false,
          "parsedQuery": {
            "id": {
              "$eq": 1
            }
          },
          "winningPlan": {
            "stage": "COLLSCAN",
            "filter": {
              "id": {
                "$eq": 1
              }
            },
            "direction": "forward"
          },
          "rejectedPlans": [ ]
        }
      }
    },
    {
      "$lookup": {
        "from": "locations",
        "as": "location",
        "localField": "location_id",
        "foreignField": "id",
        "unwinding": {
          "preserveNullAndEmptyArrays": false
        },
        "matching": {
          "location.coordinates": {
            "$geoWithin": {
              "$geometry": {
                "type": "MultiPolygon",
                "coordinates": [
                  [
                    [
                      [
                        20,
                        70
                      ],
                      [
                        30,
                        70
                      ],
                      [
                        30,
                        50
                      ],
                      [
                        20,
                        50
                      ],
                      [
                        20,
                        70
                      ]
                    ]
                  ]
                ]
              }
            }
          }
        }
      }
    }
  ],
  "ok": 1
}

You can see that in 3.4.4 we are absorbing the match into the $lookup stage, so my initial guess is that something about this process is going wrong (see SERVER-21612, fixed in 3.3.6). I'll check to see if this works on 3.3.5 but not 3.3.6.

Generated at Thu Feb 08 04:21:22 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.