[SERVER-65884] $lookup from time-series can place $sequentialCache after correlated $match Created: 22/Apr/22 Updated: 29/Oct/23 Resolved: 22/Jul/22 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | 5.0.7 |
| Fix Version/s: | 6.0.1, 5.0.11, 6.1.0-rc0 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | David Percy | Assignee: | Ruslan Abdulkhalikov (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | quick-tech-debt | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||||||||||
| Issue Links: |
|
||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||
| Operating System: | ALL | ||||||||||||||||
| Backport Requested: |
v6.0, v5.3, v5.0
|
||||||||||||||||
| Sprint: | QO 2022-05-16, QO 2022-05-30, QO 2022-06-13, QO 2022-06-27, QO 2022-07-11, QO 2022-07-25 | ||||||||||||||||
| Participants: | |||||||||||||||||
| Linked BF Score: | 205 | ||||||||||||||||
| Description |
|
When $lookup's foreign collection is time-series, and the inner pipeline has a correlated $match stage, it can incorrectly place a $sequentialCache after the $match. For example, suppose we start with:
When $lookup optimizes a pipeline, it first resolves the views and tentatively adds a $sequentialCache stage.
Then it calls optimizeAt left-to-right. Since the $match does not depend on any measurement fields in the inner pipeline, it correctly swaps with $_internalUnpackBucket:
Next we optimize the $_internalUnpackBucket stage. Like $lookup, this stage does something special by calling optimize recursively in its own optimize method. It zooms in on the pipeline suffix:
Now we optimize $sequentialCache, and it doesn't see the other stages, so it thinks this is an uncorrelated pipeline. It does not remove itself. Once all the calls to optimize return we're left with this incorrect pipeline:
We should have either removed the $sequentialCache, or positioned it before the correlated $match stage. |
| Comments |
| Comment by Githook User [ 22/Jul/22 ] |
|
Author: {'name': 'ruslan.abdulkhalikov', 'email': 'ruslan.abdulkhalikov@mongodb.com', 'username': 'rusabd1'}Message: (cherry picked from commit dfec48667e724500f878c83bb1a10a609b2da669) |
| Comment by Githook User [ 22/Jul/22 ] |
|
Author: {'name': 'ruslan.abdulkhalikov', 'email': 'ruslan.abdulkhalikov@mongodb.com', 'username': 'rusabd1'}Message: (cherry picked from commit dfec48667e724500f878c83bb1a10a609b2da669) |
| Comment by Githook User [ 22/Jul/22 ] |
|
Author: {'name': 'ruslan.abdulkhalikov', 'email': 'ruslan.abdulkhalikov@mongodb.com', 'username': 'rusabd1'}Message: |
| Comment by Githook User [ 17/May/22 ] |
|
Author: {'name': 'James Wahlin', 'email': 'james@mongodb.com', 'username': 'jameswahlin'}Message: Revert " This reverts commit a672dcd5658bdb7117de384e50e1747e610a5684. |
| Comment by Githook User [ 17/May/22 ] |
|
Author: {'name': 'James Wahlin', 'email': 'james@mongodb.com', 'username': 'jameswahlin'}Message: Revert " This reverts commit 8d8d5077f1ab5d161919b3a3131ce10c8a96b5f9. |
| Comment by Githook User [ 17/May/22 ] |
|
Author: {'name': 'James Wahlin', 'email': 'james@mongodb.com', 'username': 'jameswahlin'}Message: Revert " This reverts commit 252235fda4d45e85db342bb6437b1587b980b1ed. |
| Comment by Steve Tarzia [ 17/May/22 ] |
|
Re-opening this because we had to revert the patch. Seems like the prior solution caused this BF: https://jira.mongodb.org/browse/BF-25272 |
| Comment by Githook User [ 17/May/22 ] |
|
Author: {'name': 'ruslan.abdulkhalikov', 'email': 'ruslan.abdulkhalikov@mongodb.com', 'username': 'rusabd1'}Message: (cherry picked from commit a672dcd5658bdb7117de384e50e1747e610a5684) |
| Comment by Githook User [ 17/May/22 ] |
|
Author: {'name': 'ruslan.abdulkhalikov', 'email': 'ruslan.abdulkhalikov@mongodb.com', 'username': 'rusabd1'}Message: (cherry picked from commit a672dcd5658bdb7117de384e50e1747e610a5684) |
| Comment by Githook User [ 17/May/22 ] |
|
Author: {'name': 'ruslan.abdulkhalikov', 'email': 'ruslan.abdulkhalikov@mongodb.com', 'username': 'rusabd1'}Message: |
| Comment by David Percy [ 09/May/22 ] |
|
Another solution could be to disable the $sequentialCache optimization when the foreign collection is timeseries. This would probably be a smaller code change and easier to backport, than changing how $sequentialCache works in general. |
| Comment by David Percy [ 22/Apr/22 ] |
|
I'm curious why we position the $sequentialCache at the end in the first place. It seems like it would be more correct to put it at the beginning, so that we never have a correlated stage before it. Maybe it's because we want $sequentialCache to position after all the other stages have moved around, because we don't want it to interfere with other optimizations. One solution could be to have $lookup optimize the pipeline first, and then decide about $sequentialCache after that. |