[SERVER-79637] Incorrect query results in $lookup with TS foreign collection using a correlated predicate Created: 02/Aug/23  Updated: 24/Jan/24  Resolved: 19/Jan/24

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 7.3.0-rc0

Type: Bug Priority: Major - P3
Reporter: Irina Yatsenko (Inactive) Assignee: Naama Bareket
Resolution: Fixed Votes: 1
Labels: greenerbuild
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Assigned Teams:
Query Integration
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v7.0, v6.0, v5.0
Steps To Reproduce:

> db.local.insertMany([{_id: 0, key: 1},{_id: 1, key: 2}])
 
> db.createCollection("foreign", { timeseries: { timeField: "time", metaField: "meta", granularity: "minutes" }})
> db.foreign.insertMany([{time: new Date(), _id: 0, meta: 1, val1: 42, val2: 100},{time: new Date(), _id: 1, meta: 2, val1: 17, val2: 100}])
 
db.local.aggregate(
  {$lookup: {
    from: "foreign",
    let: {lkey: "$key"},
    pipeline: [
      {$match: {$expr: {$lt: ["$val1","$val2"]}}},
      {$match: {$expr: {$eq: ["$meta","$$lkey"]}}},
      {$project: {lkey: "$$lkey",fkey: "$meta",val: "$val1",_id: 0,}},
    ],
    as: "joined"
  }}
)
The result is:
{ "_id" : 0, "key" : 1, "joined" : [ { "lkey" : 1, "fkey" : 1, "val" : 42 } ] }
{ "_id" : 1, "key" : 2, "joined" : [ { "lkey" : 1, "fkey" : 1, "val" : 42 } ] }

The match on the local record with key=2 is wrong.

Sprint: QI 2023-10-16, QI 2023-10-30, QI 2023-11-13, QI 2023-11-27, QI 2023-12-11, QI 2023-12-25, QI 2024-01-08, QI 2024-01-22
Participants:
Case:

 Description   

The bug is a result of two different optimizations not being aware of each other. Basically, this is what's going on when there are multiple $match stages and the correlated one isn't the first:
1. TS-specific optimization of predicates replaces the correlated match with a filter on a collection access stage.
2. Lookup-specific optimization that is supposed to cache results from an uncorrelated collection access fails to notice that it cannot be applied here and caches the results of the collection access with the first seen key, which are then reused with the following keys.

Combining two $match stages into one makes it sufficiently complex to disable optimization #1.
Swapping the two $match stages allows optimization #2 to realize the cache should be abandoned.



 Comments   
Comment by Githook User [ 19/Jan/24 ]

Author:

{'name': 'naama-bareket', 'email': '85578126+naama-bareket@users.noreply.github.com', 'username': 'naama-bareket'}

Message: SERVER-79637: Incorrect query results in $lookup with TS foreign collection using a correlated predicate (#16989)

GitOrigin-RevId: f64818bae7a2d8251c3f7c6c84193052910ceaf5
Branch: master
https://github.com/mongodb/mongo/commit/22e9898c17d0e0bd811e7de1e3d69b424026e6fc

Generated at Thu Feb 08 06:41:30 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.