Loading...

XML

Word

Printable

JSON

Type: Improvement
Resolution: Duplicate
Priority: Major - P3
Fix Version/s: None
Affects Version/s: 2.6.1
Component/s: Aggregation Framework
Labels:
None
Environment:
Windows Server 2008 R2

Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

Hi guys,

I have a Mongo database that contains a collection where each document is built with 3 levels of objects. Each sub level, is in fact an array of object. Here is an example of what a document can look like:

/* 0 */
{
  "_id" : NumberLong(1),
  "NonIndexedField" : NumberLong(1),
  "DateTime" : ISODate("2014-06-02T19:59:09.836Z"),
  "Field1" : "72eda069-1737-4db6-9367-8399ebdc0f8e",
  "Field2" : "2e72ccdc-0e40-49b8-9d13-2b74855fd938",
  "ChildrenLevel1" : [{
      "DateTime" : ISODate("2014-06-02T19:59:09.852Z"),
      "Field1" : "a3669b4c-0fd4-4b2f-8e2e-2fbd0555e606",
      "Field2" : "1b6a0654-fbdb-4f97-8406-5218aa12d613",
      "ChildrenLevel2" : [{
          "DateTime" : ISODate("2014-06-02T19:59:09.852Z"),
          "Field1" : "5786231f-15b5-45ea-9b40-1678634025a9",
          "Field2" : "e4af859c-8135-4029-961b-e85b855249e7",
          "ChildrenLevel3" : [{
              "Counter" : NumberLong(0),
              "DateTime" : ISODate("2014-06-02T19:59:09.852Z"),
              "Field1" : "16627323-44ea-4e33-b1d7-f6d52bb43493",
            }, {
              "Counter" : NumberLong(1),
              "DateTime" : ISODate("2014-06-02T19:59:09.852Z"),
              "Field1" : "c13b4ed7-09e1-459d-8d5d-5c6558af8eb3",
            }]
...
}

In this structure, ChildrenLevel1, ChildrenLevel2 and ChildrenLevel3 are all arrays. In the prototype that I created, they all contain 10 elements.

I load 10000 such documents in a Mongo database and I then try to run map reduce queries on it. In the end, I want to do a map reduce operation on ChildrenLevel1.ChildrenLevel2.ChildrenLevel3.Counter. My first step is to create a very simple query that will not unwind any array. I just want to be sure that I get the syntax right. Here is the query:

	db.Demo.aggregate([
	 {$match : {_id : {$lte : 5000}}},
	 {$project: {
	             _id : 0,
                 DateTime : 1
	            }},
	 {$group: {
	           _id : 0, 
	           max : {$max : "$DateTime"}
	          }
	 }
	]).pretty()

This works very fast. If I look at the mongo logs, I see that it takes only 26 ms to run.

   2014-06-02T16:25:56.267-0400 [conn1] command MongoProjectPerformanceProblem.$cmd command: aggregate { aggregate: "Demo", pipeline: [ { $match: { _id: { $lte: 5000.0 } } }, { $project: { _id: 0.0, DateTime: 1.0 } }, { $group: { _id: 0.0, max: { $max: "$DateTime" } } } ], cursor: {} } keyUpdates:0 numYields:0 locks(micros) r:20928 reslen:157 26ms

I then try to make my query just a bit more complex toward my goal. I now project my ChildrenLevel1 array as well as the top level 'DateTime' property:

	db.Demo.aggregate([
	 {$match : {_id : {$lte : 5000}}},
	 {$project: {
	             _id : 0,
                 DateTime : 1,
                 ChildrenLevel1 : 1
	            }},
	 {$group: {
	           _id : 0, 
	           max : {$max : "$DateTime"}
	          }
	 }
	]).pretty()

This query is very slow comparatively to the previous one. This new query takes about 19s to run. This is more than 700 times slower than the first query. Here is what I see in the logs.

   2014-06-02T16:29:13.918-0400 [conn1] command MongoProjectPerformanceProblem.$cmd command: aggregate { aggregate: "Demo", pipeline: [ { $match: { _id: { $lte: 5000.0 } } }, { $project: { _id: 0.0, DateTime: 1.0, ChildrenLevel1: 1.0 } }, { $group: { _id: 0.0, max: { $max: "$DateTime" } } } ], cursor: {} } keyUpdates:0 numYields:1006 locks(micros) r:17008230 reslen:157 18963ms

If I run it a second time, I have about the same result... even a little worse (20 s):

   2014-06-02T16:33:41.498-0400 [conn1] command MongoProjectPerformanceProblem.$cmd command: aggregate { aggregate: "Demo", pipeline: [ { $match: { _id: { $lte: 5000.0 } } }, { $project: { _id: 0.0, DateTime: 1.0, ChildrenLevel1: 1.0 } }, { $group: { _id: 0.0, max: { $max: "$DateTime" } } } ], cursor: {} } keyUpdates:0 numYields:1011 locks(micros) r:16189110 reslen:157 20435ms

So the query is not slow because it is loading data in RAM. Note that my server still has 45 GB of RAM available.

From what I understood of Mongo and the aggregation framwork, I had the impression that this would be super fast since all my data is already in memory and the aggregation framework simply converts my query into compiled code that runs very fast compared to the legacy map-reduce java script based technique.

Is it something that has been observed and can I expect improvement on this side? It seems a bit weird that just projecting more data takes more time. I may be completely off but it smells like data is being copied and we are not simply using the data that is already in RAM...

Note that if it can help, I can either provide a backup or a c# program that can generate the database that reproduces the problem.

This experiment was to validate if our "classic" DWH in SQL Server and and OLAP Cube could both be replaced by a Mongo database using the aggregation framework to return the same data as what I currently do with my OLAP cube. With the performances that I have seen so far, it is just not possible to migrate from SQL Server to Mongo.

Thanks for your help,
Maxime

duplicates

SERVER-13703 Presence of extraneous $project causes a less efficient plan to be used

Backlog

Assignee:: Mathias Stearn
Reporter:: Maxime Beaudry
Participants:: Ben Becker, Mathias Stearn, Maxime Beaudry, Ramon Fernandez Marina
Votes:: 0 Vote for this issue
Watchers:: 10 Start watching this issue

Created:: Jun 04 2014 11:00:23 AM UTC
Updated:: Dec 10 2014 11:06:00 PM UTC
Resolved:: Jul 24 2014 10:01:39 PM UTC

Details

Description

Attachments

Issue Links

Forms

Activity

People

Dates