Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Duplicate
Priority: Major - P3
Fix Version/s: None
Affects Version/s: 3.3.9, 3.4.4, 3.5.11
Component/s: Aggregation Framework
Labels:
- Bug
Environment:
Just a mid-2014 MacBook Pro
Intel(R) Core(TM) i7-4770HQ CPU @ 2.20GHz
16GB Ram
APPLE SSD SM0256F
OS X 10.12.6

Operating System:
ALL
Steps To Reproduce:

Hide

I've attached data.tgz containing a dump of a collection (with a single document) and query1.js, the query demonstrating the bug.

$ docker run -d -p23380:27017 --name mongo-3.3.8 -v $PWD/data:/data mongo:3.3.8
$ docker exec mongo-3.3.8 mongorestore --db skyfii_reporting --collection WifiSession_Daily /data/dump/skyfii_reporting/WifiSession_Daily.bson
$ time docker exec mongo-3.3.8 mongo skyfii_reporting /data/query1.js

…
real 0m1.806s
user 0m0.018s
sys 0m0.020s

$ docker run -d -p23440:27017 --name mongo-3.4.4 -v $PWD/data:/data mongo:3.4.4
$ docker exec mongo-3.4.4 mongorestore --db skyfii_reporting --collection WifiSession_Daily /data/dump/skyfii_reporting/WifiSession_Daily.bson
$ time docker exec mongo-3.4.4 mongo skyfii_reporting /data/query1.js

…
real 0m42.306s
user 0m0.832s
sys 0m0.194s

$ docker run -d -p23511:27017 --name mongo-3.5.11 -v $PWD/data:/data mongo:3.5.11
$ docker exec mongo-3.5.11 mongorestore --db skyfii_reporting --collection WifiSession_Daily /data/dump/skyfii_reporting/WifiSession_Daily.bson
$ time docker exec mongo-3.5.11 mongo skyfii_reporting /data/query1.js
…
real 0m40.918s
user 0m0.018s
sys 0m0.020s

Show
I've attached data.tgz containing a dump of a collection (with a single document) and query1.js, the query demonstrating the bug. $ docker run -d -p23380:27017 --name mongo-3.3.8 -v $PWD/data:/data mongo:3.3.8 $ docker exec mongo-3.3.8 mongorestore --db skyfii_reporting --collection WifiSession_Daily /data/dump/skyfii_reporting/WifiSession_Daily.bson $ time docker exec mongo-3.3.8 mongo skyfii_reporting /data/query1.js … real 0m1.806s user 0m0.018s sys 0m0.020s $ docker run -d -p23440:27017 --name mongo-3.4.4 -v $PWD/data:/data mongo:3.4.4 $ docker exec mongo-3.4.4 mongorestore --db skyfii_reporting --collection WifiSession_Daily /data/dump/skyfii_reporting/WifiSession_Daily.bson $ time docker exec mongo-3.4.4 mongo skyfii_reporting /data/query1.js … real 0m42.306s user 0m0.832s sys 0m0.194s $ docker run -d -p23511:27017 --name mongo-3.5.11 -v $PWD/data:/data mongo:3.5.11 $ docker exec mongo-3.5.11 mongorestore --db skyfii_reporting --collection WifiSession_Daily /data/dump/skyfii_reporting/WifiSession_Daily.bson $ time docker exec mongo-3.5.11 mongo skyfii_reporting /data/query1.js … real 0m40.918s user 0m0.018s sys 0m0.020s
Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

MongoDB 3.3.9 onwards added a very significant performance regression running large aggregation queries. Query times now seem to grow quadratically with query size. My bug was found using the aggregation framework but it may be a more general bug. I've only tested up to version 3.5.11.

We recently explored upgrading our current mongo at 3.0.11 to 3.4.x. However we found the execution time of one of queries which averaged a respectable ~660ms had jumped 200x to a staggering ~40 000ms. I benchmarked against a few versions to narrow it down. All queries were matched to a single document.

3.0.15 633ms
3.2.16 730ms
3.3.6 687ms
3.3.8 661ms
3.3.9 132 089ms
3.3.10 134 875ms
3.3.11 38 787ms
3.4.0 38 117ms
3.4.4 38 348ms
3.4.7 38 097ms
3.5.11 41 480ms

It appears something changed in 3.3.9 which was then partially improved in 3.3.11. None of the JIRA issues over those versions stood out for me so I assume it’s a side affect or something else.

Our aggregation query does happen to be very long at ~300K lines and is auto generated. The query is essentially performing a long list of $max operations in the $group stage to compute the union of many hyper-log-log counters. A similarly long $project stage just renames the fields. The schema has ~75 counters with ~1K buckets per counter, each bucket having its own field. I’ve attached a sample generated query and sample collection with a single corresponding document as well as a screenshot of the schema to help visualise the schema.

I ran some benchmarks on 3.4.4 varying both the contents of the document as well as the number of counters queried in a single aggregation. The former had no affect. For the latter I took some sample times. Note I used a different environment compared to the query used for comparing the versions above so the times won’t match:

counters ms ms/counter
1 100 100
2 182 91
3 272 91
4 367 92
5 495 99
6 637 106
7 840 120
8 965 121
9 1231 137
10 1400 140
11 1700 155
17 3933 231
25 8581 343
40 18605 465
50 34863 697
60 47746 796
75 67665 902
90 89121 990
100 145390 1454

Initially query time looks linear but seems to grow quadratically as n becomes larger. I can only assume there’s a linear scan in an inner loop somewhere

We currently rely on being able to query all counters within a few seconds at most and this is blocking us from upgrading beyond 3.2. One temporary workaround would be issuing multiple smaller queries for those counters. However even when doing that the query performance looks too poor for users. Perhaps there’s a better schema / approach we could take for this type of problem of storing many HLL over time and computing their unions at query-time you could suggest? I know I’d love a feature to perform parallel operations on arrays in the $group stage? Eg given two arrays [1,4,2] and [2,3,1] compute [2,4,2]?

The ideal outcome would be for performance to be improved to pre 3.3.9 levels.

If you need more information please let me know. Also thanks for regularly publishing docker images, it made comparing versions a breeze.

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

data.tgz
1.12 MB
Aug 14 2017 04:42:44 AM UTC
explain.3.3.8
16.35 MB
Aug 21 2017 10:43:32 PM UTC
explain.3.3.9
16.36 MB
Aug 21 2017 10:48:38 PM UTC

duplicates

SERVER-30449 ProjectionSpecValidator is O(N**2) in number of fields in the projection

Closed

related to

SERVER-18966 Allow exclusion in $project stage of aggregation pipeline

Closed

Assignee:: Mark Agarunov (Inactive)
Reporter:: Daniel Grigg
Participants:: Charlie Swanson, Daniel Grigg, Mark Agarunov
Votes:: 0 Vote for this issue
Watchers:: 17 Start watching this issue

Created:: Aug 14 2017 04:49:27 AM UTC
Updated:: Oct 09 2017 05:01:53 PM UTC
Resolved:: Aug 31 2017 04:33:13 PM UTC

Details

Description

Attachments

Attachments

Issue Links

Forms

Activity

People

Dates