Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-14159

Aggregation framework performances drops significantly when projecting large sub documents

    • Type: Icon: Improvement Improvement
    • Resolution: Duplicate
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 2.6.1
    • Component/s: Aggregation Framework
    • None
    • Environment:
      Windows Server 2008 R2

      Hi guys,

      I have a Mongo database that contains a collection where each document is built with 3 levels of objects. Each sub level, is in fact an array of object. Here is an example of what a document can look like:

      /* 0 */
      {
        "_id" : NumberLong(1),
        "NonIndexedField" : NumberLong(1),
        "DateTime" : ISODate("2014-06-02T19:59:09.836Z"),
        "Field1" : "72eda069-1737-4db6-9367-8399ebdc0f8e",
        "Field2" : "2e72ccdc-0e40-49b8-9d13-2b74855fd938",
        "ChildrenLevel1" : [{
            "DateTime" : ISODate("2014-06-02T19:59:09.852Z"),
            "Field1" : "a3669b4c-0fd4-4b2f-8e2e-2fbd0555e606",
            "Field2" : "1b6a0654-fbdb-4f97-8406-5218aa12d613",
            "ChildrenLevel2" : [{
                "DateTime" : ISODate("2014-06-02T19:59:09.852Z"),
                "Field1" : "5786231f-15b5-45ea-9b40-1678634025a9",
                "Field2" : "e4af859c-8135-4029-961b-e85b855249e7",
                "ChildrenLevel3" : [{
                    "Counter" : NumberLong(0),
                    "DateTime" : ISODate("2014-06-02T19:59:09.852Z"),
                    "Field1" : "16627323-44ea-4e33-b1d7-f6d52bb43493",
                  }, {
                    "Counter" : NumberLong(1),
                    "DateTime" : ISODate("2014-06-02T19:59:09.852Z"),
                    "Field1" : "c13b4ed7-09e1-459d-8d5d-5c6558af8eb3",
                  }]
      ...
      }
      

      In this structure, ChildrenLevel1, ChildrenLevel2 and ChildrenLevel3 are all arrays. In the prototype that I created, they all contain 10 elements.

      I load 10000 such documents in a Mongo database and I then try to run map reduce queries on it. In the end, I want to do a map reduce operation on ChildrenLevel1.ChildrenLevel2.ChildrenLevel3.Counter. My first step is to create a very simple query that will not unwind any array. I just want to be sure that I get the syntax right. Here is the query:

      	db.Demo.aggregate([
      	 {$match : {_id : {$lte : 5000}}},
      	 {$project: {
      	             _id : 0,
                       DateTime : 1
      	            }},
      	 {$group: {
      	           _id : 0, 
      	           max : {$max : "$DateTime"}
      	          }
      	 }
      	]).pretty()
      

      This works very fast. If I look at the mongo logs, I see that it takes only 26 ms to run.

         2014-06-02T16:25:56.267-0400 [conn1] command MongoProjectPerformanceProblem.$cmd command: aggregate { aggregate: "Demo", pipeline: [ { $match: { _id: { $lte: 5000.0 } } }, { $project: { _id: 0.0, DateTime: 1.0 } }, { $group: { _id: 0.0, max: { $max: "$DateTime" } } } ], cursor: {} } keyUpdates:0 numYields:0 locks(micros) r:20928 reslen:157 26ms
      

      I then try to make my query just a bit more complex toward my goal. I now project my ChildrenLevel1 array as well as the top level 'DateTime' property:

      	db.Demo.aggregate([
      	 {$match : {_id : {$lte : 5000}}},
      	 {$project: {
      	             _id : 0,
                       DateTime : 1,
                       ChildrenLevel1 : 1
      	            }},
      	 {$group: {
      	           _id : 0, 
      	           max : {$max : "$DateTime"}
      	          }
      	 }
      	]).pretty()
      

      This query is very slow comparatively to the previous one. This new query takes about 19s to run. This is more than 700 times slower than the first query. Here is what I see in the logs.

         2014-06-02T16:29:13.918-0400 [conn1] command MongoProjectPerformanceProblem.$cmd command: aggregate { aggregate: "Demo", pipeline: [ { $match: { _id: { $lte: 5000.0 } } }, { $project: { _id: 0.0, DateTime: 1.0, ChildrenLevel1: 1.0 } }, { $group: { _id: 0.0, max: { $max: "$DateTime" } } } ], cursor: {} } keyUpdates:0 numYields:1006 locks(micros) r:17008230 reslen:157 18963ms
      

      If I run it a second time, I have about the same result... even a little worse (20 s):

         2014-06-02T16:33:41.498-0400 [conn1] command MongoProjectPerformanceProblem.$cmd command: aggregate { aggregate: "Demo", pipeline: [ { $match: { _id: { $lte: 5000.0 } } }, { $project: { _id: 0.0, DateTime: 1.0, ChildrenLevel1: 1.0 } }, { $group: { _id: 0.0, max: { $max: "$DateTime" } } } ], cursor: {} } keyUpdates:0 numYields:1011 locks(micros) r:16189110 reslen:157 20435ms
      

      So the query is not slow because it is loading data in RAM. Note that my server still has 45 GB of RAM available.

      From what I understood of Mongo and the aggregation framwork, I had the impression that this would be super fast since all my data is already in memory and the aggregation framework simply converts my query into compiled code that runs very fast compared to the legacy map-reduce java script based technique.

      Is it something that has been observed and can I expect improvement on this side? It seems a bit weird that just projecting more data takes more time. I may be completely off but it smells like data is being copied and we are not simply using the data that is already in RAM...

      Note that if it can help, I can either provide a backup or a c# program that can generate the database that reproduces the problem.

      This experiment was to validate if our "classic" DWH in SQL Server and and OLAP Cube could both be replaced by a Mongo database using the aggregation framework to return the same data as what I currently do with my OLAP cube. With the performances that I have seen so far, it is just not possible to migrate from SQL Server to Mongo.

      Thanks for your help,
      Maxime

            Assignee:
            mathias@mongodb.com Mathias Stearn
            Reporter:
            mabead78 Maxime Beaudry
            Votes:
            0 Vote for this issue
            Watchers:
            10 Start watching this issue

              Created:
              Updated:
              Resolved: