[SERVER-21284] $lookup should cache query results Created: 04/Nov/15 Updated: 02/Mar/23 |
|
| Status: | Backlog |
| Project: | Core Server |
| Component/s: | Aggregation Framework |
| Affects Version/s: | 3.2.0-rc1 |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Benjamin Appréderisse | Assignee: | Backlog - Query Optimization |
| Resolution: | Unresolved | Votes: | 11 |
| Labels: | optimization, performance, query-product-scope-1, query-product-urgency-2, query-product-value-2 | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||
| Assigned Teams: |
Query Optimization
|
||||||||||||||||||||
| Participants: | |||||||||||||||||||||
| Description |
|
If you are doing an aggregation on a large collection, and your aggregation contains a $lookup stage which queries a smaller collection (think <= 100 documents), the $lookup stage will issue a query for each document in the larger collection. The $lookup stage should cache the results of the query to avoid unnecessary work in situations like this. Original DescriptionI have tried the $lookup aggregator and the performance was not as expected. After an exchange of emails with Norberto : "it looks like it is not using either indexes, and therefore it's taking so long. " Please find above the link to test: |
| Comments |
| Comment by Xiaoqiang Zhou [ 18/Aug/22 ] |
|
Any progress on this? |
| Comment by Labros Papadopoulos [ 07/Feb/22 ] |
|
Are there any updates related to this issue? Our company have ran into this issue. Even though we have been using the embed documents approach, and the database works blazing fast when executing the standard CRUD operations, we can't avoid some of the lookup "join" operations that are required for the generation of certain reports. Even when using indexing to support those join operations, when joining tables that have 100K+ documents the operations take more than 30 secs! It's a petty that this issue hasn't yet been resolved, because aside of that, the MongoDb has done a stellar job! |
| Comment by Charlie Swanson [ 05/Nov/15 ] |
|
Ok great! I have filed SERVER-21312 if you want to track that ticket as well. |
| Comment by Norberto Fernando Rocha Leite (Inactive) [ 04/Nov/15 ] |
|
Hi Benjamin, He have done a bit more of drilling into the issue you raised and as charlie.swanson is mentioning the system is actually using the indexes accordingly. N. |
| Comment by Benjamin Appréderisse [ 04/Nov/15 ] |
|
Hi Charlie, Thanks for giving me these explanations. We can use this way. Benjamin |
| Comment by Charlie Swanson [ 04/Nov/15 ] |
|
Hi bappr, I have been in communication with Norberto, and it looks like the $lookup stage is indeed using an index on the foreign collection. It is not using one on the local collection, but that is expected, and probably the fastest way to proceed, as it is looking at all documents in the collection. So my impression is that $lookup is behaving exactly as expected. It is probably slow because it has to do 350K queries. If this is significantly slower than doing those queries on their own, that would indeed be unexpected, and a likely bug. That said, there are many potential optimizations that we are not yet taking advantage of for the $lookup stage. For instance, we could maintain a cache of the results of the queries, or batch the lookups to do fewer queries. These are things we have thought about doing, but do not have tickets in JIRA yet. If this is more what you meant by "performance is not as expected", we can morph this ticket into the request to cache the results, and open another ticket for batching the lookups. If that sounds like a good way to proceed, let me know and I'll go ahead with it. |