[SERVER-64021] Lookup stage poor performance Created: 26/Feb/22  Updated: 17/Mar/22  Resolved: 17/Mar/22

Status: Closed
Project: Core Server
Component/s: Aggregation Framework, Performance
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Labros Papadopoulos Assignee: Edwin Zhou
Resolution: Duplicate Votes: 5
Labels: Aggregation, Lookup, Performance
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Windows 10
MongoDb C# Driver v. 2.14.1


Issue Links:
Duplicate
duplicates SERVER-21284 $lookup should cache query results Backlog
duplicates SERVER-21312 $lookup should batch query requests Backlog
Operating System: ALL
Steps To Reproduce:

Create a collection with 100K documents.

Create a collection with any number of documents.

Join the second collection on the first collection.

Participants:

 Description   

We switch from a SQL database to MongoDb for our latest cloud application and the performance benefits we received were spectacular. We also followed the MongoDb guidelines and used document embedding when possible. Currently none of our standard CRUD operations require any form of join and are all getting executed incredibly fast (our data sets range from 100K to 2M documents).

 

Howerver, in order for us to generate certain reports we need to perform certain joins between collections. The MongoDb C# driver that we are using translates the LINQ join query to a lookup stage. The lookup stage performs a search to the joined collection for every entry of the initial collection. We knew that the "join" performance was lacking on MongoDb, and should be avoided, but requiring 20 seconds to join a collection with 200K documents with a collection with 100K documents was totally unexpected.

 

This is a known issue and is acknowledged by the MongoDb team 7 years ago and 2 solutions were proposed.

[SERVER-21312|SERVER-21312 $lookup should batch query requests - MongoDB Jira]

[SERVER-21284|SERVER-21284 $lookup should cache query results - MongoDB Jira]

 

We would like to know if you are planning to resolve this issue or if this issue is not possible to be resolved (in order for us to try to resolve it using external methods). Granted that this issue has such a devastating performance impact and has not been resolved for almost 7 years we are lead to believe that it's the later case. 

 

Thanks in advance!



 Comments   
Comment by Edwin Zhou [ 17/Mar/22 ]

Hi paplabros@gmail.com,

Thank you for your report. I understand that the performance impact of $lookup can be devastating. As you mentioned, we currently are tracking improvements to $lookup through SERVER-21312 and SERVER-21284. The best place to track these improvements will be on those tickets and I encourage you to watch and vote on those tickets. I will close this ticket as a duplicate of SERVER-21312 and SERVER-21284.

If you need further assistance troubleshooting your performance issues, I encourage you to ask our community for help by posting on the MongoDB Developer Community Forums.

Best,
Edwin

Comment by Dimitris Kalantzis [ 01/Mar/22 ]

We had similar performance issues on a cloud project as well. We had to use a different db for that kind of procedures. We would love to go full mongo..watching this..

Generated at Thu Feb 08 05:59:16 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.