[SERVER-14159] Aggregation framework performances drops significantly when projecting large sub documents Created: 04/Jun/14 Updated: 10/Dec/14 Resolved: 24/Jul/14 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Aggregation Framework |
| Affects Version/s: | 2.6.1 |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Maxime Beaudry | Assignee: | Mathias Stearn |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Windows Server 2008 R2 |
||
| Issue Links: |
|
||||||||
| Participants: | |||||||||
| Description |
|
Hi guys, I have a Mongo database that contains a collection where each document is built with 3 levels of objects. Each sub level, is in fact an array of object. Here is an example of what a document can look like:
In this structure, ChildrenLevel1, ChildrenLevel2 and ChildrenLevel3 are all arrays. In the prototype that I created, they all contain 10 elements. I load 10000 such documents in a Mongo database and I then try to run map reduce queries on it. In the end, I want to do a map reduce operation on ChildrenLevel1.ChildrenLevel2.ChildrenLevel3.Counter. My first step is to create a very simple query that will not unwind any array. I just want to be sure that I get the syntax right. Here is the query:
This works very fast. If I look at the mongo logs, I see that it takes only 26 ms to run.
I then try to make my query just a bit more complex toward my goal. I now project my ChildrenLevel1 array as well as the top level 'DateTime' property:
This query is very slow comparatively to the previous one. This new query takes about 19s to run. This is more than 700 times slower than the first query. Here is what I see in the logs.
If I run it a second time, I have about the same result... even a little worse (20 s):
So the query is not slow because it is loading data in RAM. Note that my server still has 45 GB of RAM available. From what I understood of Mongo and the aggregation framwork, I had the impression that this would be super fast since all my data is already in memory and the aggregation framework simply converts my query into compiled code that runs very fast compared to the legacy map-reduce java script based technique. Is it something that has been observed and can I expect improvement on this side? It seems a bit weird that just projecting more data takes more time. I may be completely off but it smells like data is being copied and we are not simply using the data that is already in RAM... Note that if it can help, I can either provide a backup or a c# program that can generate the database that reproduces the problem. This experiment was to validate if our "classic" DWH in SQL Server and and OLAP Cube could both be replaced by a Mongo database using the aggregation framework to return the same data as what I currently do with my OLAP cube. With the performances that I have seen so far, it is just not possible to migrate from SQL Server to Mongo. Thanks for your help, |
| Comments |
| Comment by Ramon Fernandez Marina [ 24/Jul/14 ] | ||||||||||||||||||||
|
mabead78, we're closing this ticket as a duplicate of SERVER-13703. Until this is implemented, the only option is to not project any subdocuments you do not need for your query. Regards, | ||||||||||||||||||||
| Comment by Maxime Beaudry [ 04/Jun/14 ] | ||||||||||||||||||||
|
The results and logic of the queries are not important here. What I am trying to show with these two queries is that just projecting ChildrenLevel1 makes the performances significantly worse. This makes it for easy to reproduce the problem. The query may not be logical, but it does show that performances degrades significantly when, in my Mongo newbee opinion, it should not. In my real use case (not the prototype that I showed), I need a project at the beginning and a group later on. My final real query (with real data and not prototype data) is:
| ||||||||||||||||||||
| Comment by Ben Becker [ 04/Jun/14 ] | ||||||||||||||||||||
|
Hi Maxime, Did you mean to $group by _id: 0? Should this be ChildrenLevel1.ChildrenLevel2.ChildrenLevel3.Counter? Also, it looks like the $project stage can go after the $group; e.g.:
|