[SERVER-65884] $lookup from time-series can place $sequentialCache after correlated $match Created: 22/Apr/22  Updated: 29/Oct/23  Resolved: 22/Jul/22

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 5.0.7
Fix Version/s: 6.0.1, 5.0.11, 6.1.0-rc0

Type: Bug Priority: Major - P3
Reporter: David Percy Assignee: Ruslan Abdulkhalikov (Inactive)
Resolution: Fixed Votes: 0
Labels: quick-tech-debt
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File repro.js    
Issue Links:
Backports
Depends
Problem/Incident
Related
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v6.0, v5.3, v5.0
Sprint: QO 2022-05-16, QO 2022-05-30, QO 2022-06-13, QO 2022-06-27, QO 2022-07-11, QO 2022-07-25
Participants:
Linked BF Score: 205

 Description   

When $lookup's foreign collection is time-series, and the inner pipeline has a correlated $match stage, it can incorrectly place a $sequentialCache after the $match.

For example, suppose we start with:

db.someCollection.aggregate([
   {$lookup: {
    from: 'timeseriesColl',
    let: {myJoinVar: ...},
    pipeline: [
      {$match: ... myJoinVar ...}
    ]
  }}
])

When $lookup optimizes a pipeline, it first resolves the views and tentatively adds a $sequentialCache stage.

{$_internalUnpackBucket: ...}
{$match ... myJoinVar ...}
{$sequentialCache ...}

Then it calls optimizeAt left-to-right. Since the $match does not depend on any measurement fields in the inner pipeline, it correctly swaps with $_internalUnpackBucket:

{$match ... myJoinVar ...}
{$_internalUnpackBucket: ...}
{$sequentialCache ...}

Next we optimize the $_internalUnpackBucket stage. Like $lookup, this stage does something special by calling optimize recursively in its own optimize method. It zooms in on the pipeline suffix:

{$sequentialCache ...}

Now we optimize $sequentialCache, and it doesn't see the other stages, so it thinks this is an uncorrelated pipeline. It does not remove itself.

Once all the calls to optimize return we're left with this incorrect pipeline:

{$match ... myJoinVar ...}
{$_internalUnpackBucket: ...}
{$sequentialCache ...}

We should have either removed the $sequentialCache, or positioned it before the correlated $match stage.



 Comments   
Comment by Githook User [ 22/Jul/22 ]

Author:

{'name': 'ruslan.abdulkhalikov', 'email': 'ruslan.abdulkhalikov@mongodb.com', 'username': 'rusabd1'}

Message: SERVER-65884 $lookup from ts optimize $sequentialCache

(cherry picked from commit dfec48667e724500f878c83bb1a10a609b2da669)
Branch: v5.0
https://github.com/mongodb/mongo/commit/c75c69d234268c9b0758bc8ebb1057a73d1f64d4

Comment by Githook User [ 22/Jul/22 ]

Author:

{'name': 'ruslan.abdulkhalikov', 'email': 'ruslan.abdulkhalikov@mongodb.com', 'username': 'rusabd1'}

Message: SERVER-65884 $lookup from ts optimize $sequentialCache

(cherry picked from commit dfec48667e724500f878c83bb1a10a609b2da669)
Branch: v6.0
https://github.com/mongodb/mongo/commit/4bdceddd4655d954603442d4d5575f8d19a78d76

Comment by Githook User [ 22/Jul/22 ]

Author:

{'name': 'ruslan.abdulkhalikov', 'email': 'ruslan.abdulkhalikov@mongodb.com', 'username': 'rusabd1'}

Message: SERVER-65884 $lookup from ts optimize $sequentialCache
Branch: master
https://github.com/mongodb/mongo/commit/dfec48667e724500f878c83bb1a10a609b2da669

Comment by Githook User [ 17/May/22 ]

Author:

{'name': 'James Wahlin', 'email': 'james@mongodb.com', 'username': 'jameswahlin'}

Message: Revert "SERVER-65884 $lookup from ts optimize $sequentialCache"

This reverts commit a672dcd5658bdb7117de384e50e1747e610a5684.
Branch: master
https://github.com/mongodb/mongo/commit/b6f6bb81cb44597d222493ae9246ee5adb62f794

Comment by Githook User [ 17/May/22 ]

Author:

{'name': 'James Wahlin', 'email': 'james@mongodb.com', 'username': 'jameswahlin'}

Message: Revert "SERVER-65884 $lookup from ts optimize $sequentialCache"

This reverts commit 8d8d5077f1ab5d161919b3a3131ce10c8a96b5f9.
Branch: v5.3
https://github.com/mongodb/mongo/commit/16d4b450daec0f3fa71b5070995134dfabfde4be

Comment by Githook User [ 17/May/22 ]

Author:

{'name': 'James Wahlin', 'email': 'james@mongodb.com', 'username': 'jameswahlin'}

Message: Revert "SERVER-65884 $lookup from ts optimize $sequentialCache"

This reverts commit 252235fda4d45e85db342bb6437b1587b980b1ed.
Branch: v6.0
https://github.com/mongodb/mongo/commit/387db2d0f7b26b4ae6efdf46854e60f12f93d4d2

Comment by Steve Tarzia [ 17/May/22 ]

Re-opening this because we had to revert the patch. Seems like the prior solution caused this BF: https://jira.mongodb.org/browse/BF-25272

Comment by Githook User [ 17/May/22 ]

Author:

{'name': 'ruslan.abdulkhalikov', 'email': 'ruslan.abdulkhalikov@mongodb.com', 'username': 'rusabd1'}

Message: SERVER-65884 $lookup from ts optimize $sequentialCache

(cherry picked from commit a672dcd5658bdb7117de384e50e1747e610a5684)
Branch: v5.3
https://github.com/mongodb/mongo/commit/8d8d5077f1ab5d161919b3a3131ce10c8a96b5f9

Comment by Githook User [ 17/May/22 ]

Author:

{'name': 'ruslan.abdulkhalikov', 'email': 'ruslan.abdulkhalikov@mongodb.com', 'username': 'rusabd1'}

Message: SERVER-65884 $lookup from ts optimize $sequentialCache

(cherry picked from commit a672dcd5658bdb7117de384e50e1747e610a5684)
Branch: v6.0
https://github.com/mongodb/mongo/commit/252235fda4d45e85db342bb6437b1587b980b1ed

Comment by Githook User [ 17/May/22 ]

Author:

{'name': 'ruslan.abdulkhalikov', 'email': 'ruslan.abdulkhalikov@mongodb.com', 'username': 'rusabd1'}

Message: SERVER-65884 $lookup from ts optimize $sequentialCache
Branch: master
https://github.com/mongodb/mongo/commit/a672dcd5658bdb7117de384e50e1747e610a5684

Comment by David Percy [ 09/May/22 ]

Another solution could be to disable the $sequentialCache optimization when the foreign collection is timeseries. This would probably be a smaller code change and easier to backport, than changing how $sequentialCache works in general.

Comment by David Percy [ 22/Apr/22 ]

I'm curious why we position the $sequentialCache at the end in the first place. It seems like it would be more correct to put it at the beginning, so that we never have a correlated stage before it. Maybe it's because we want $sequentialCache to position after all the other stages have moved around, because we don't want it to interfere with other optimizations. One solution could be to have $lookup optimize the pipeline first, and then decide about $sequentialCache after that.

Generated at Thu Feb 08 06:03:54 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.