Priority: Major - P3
Affects Version/s: 4.0.9
Fix Version/s: None
Component/s: Aggregation Framework
We've been working on modeling bidirectional relations in MongoDB. We use the common approach of storing ids of related documents in specific fields, e.g.:
We've created a GraphQL server that uses the Aggregation Framework to perform lookups and resolve e.g. such a query:
to such a response:
Our reasoning was that using a single aggregation pipeline even for complex, nested queries will easily outperform sequential finds on multiple collections with all their associated round-trips and subsequent result manipulation.
Using the example above, if we wanted to fetch some users along with all their items without leveraging the Aggregation Framework, we would have to first query the "users" collection and get the list of associated item ids, then query the "items" collection using those ids and finally connect users and items according to their relationships.
The Aggregation Framework allows us to define all those tasks in a single pipeline and delegate their completion to MongoDB. The only problem is that it's much slower than the find-based approach and the performance gap only grows as we add more complexity. Here's a link to a simple benchmark, executed in the MongoDB, console that illustrates the problem:
Sample results on my system are:
The performance characteristics are almost identical when we use Node.js or C++ drivers.
What could cause such discrepancies? Shouldn't a single, internal aggregation outperform a sequence of 3 find requests along with some JS data manipulation? Are there any issues in our pipeline that lead to slower execution?