Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-45326

Poor Aggregation Framework performance in relational queries

    • Type: Icon: Question Question
    • Resolution: Duplicate
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 4.0.9
    • Component/s: Aggregation Framework
    • None

      We've been working on modeling bidirectional relations in MongoDB. We use the common approach of storing ids of related documents in specific fields, e.g.:

      "users"
      {_id: "user1", items: ["item1"], name: "John"}
      
      "items"
      {_id: "item1", owner: "user1", type: "blanket"}

      We've created a GraphQL server that uses the Aggregation Framework to perform lookups and resolve e.g. such a query:

       

      {
        users {
          id
          name
          items {
            id
            type
          }
        }
      }

      to such a response:

       

       

      {
        "users": {
          "id": "user1",
          "name": "John",
          "items": [
            {
              "id": "item1",
              "type": "blanket"
            }
          ]
        }
      }

       

       Our reasoning was that using a single aggregation pipeline even for complex, nested queries will easily outperform sequential finds on multiple collections with all their associated round-trips and subsequent result manipulation.

      Using the example above, if we wanted to fetch some users along with all their items without leveraging the Aggregation Framework, we would have to first query the "users" collection and get the list of associated item ids, then query the "items" collection using those ids and finally connect users and items according to their relationships.

      The Aggregation Framework allows us to define all those tasks in a single pipeline and delegate their completion to MongoDB. The only problem is that it's much slower than the find-based approach and the performance gap only grows as we add more complexity. Here's a link to a simple benchmark, executed in the MongoDB, console that illustrates the problem:

      https://gist.github.com/alethes/369faf2e1c5a83f4fbcaca9001dbd501

      Sample results on my system are:

      Aggregation 1: 387 ms
      Aggregation 2: 103 ms
      Find: 23 ms

      The performance characteristics are almost identical when we use Node.js or C++ drivers.

      What could cause such discrepancies? Shouldn't a single, internal aggregation outperform a sequence of 3 find requests along with some JS data manipulation? Are there any issues in our pipeline that lead to slower execution?

       

            Assignee:
            eric.sedor@mongodb.com Eric Sedor
            Reporter:
            james@togedo.com James Daab
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated:
              Resolved: