[SERVER-25865] $group operation is slow since MongoDB 3.2 on Windows Created: 30/Aug/16  Updated: 17/Jan/17  Resolved: 09/Sep/16

Status: Closed
Project: Core Server
Component/s: Aggregation Framework
Affects Version/s: 3.2.9, 3.3.12
Fix Version/s: 3.2.12, 3.3.14

Type: Bug Priority: Major - P3
Reporter: Linda Qin Assignee: David Storch
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File aggregation_3.2_windows.png     File diagnostic.data.tar     PNG File group.png    
Issue Links:
Backports
Related
related to SERVER-25868 Incomplete FTDC files on Windows Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v3.2
Sprint: Query 2016-09-19
Participants:

 Description   

The $group operation is much slower for MongoDB 3.2/3.3 comparing to MongoDB 3.0 on Windows. I don't see the issue on OSX or Linux.

  • MongoDB 3.0 on Windows:
    Run the following commands to create the collection, index and then run the aggregation.

    use test;
    db.collection.drop();
    for (var i = 0; i < 40000; ++i) { db.collection.insert({x: Math.floor(Math.random()*1000000)});}
    db.collection.createIndex({x: 1});
    var start = new Date().getTime(); db.collection.aggregate( [{$group: {_id: "$x", value: {$sum: 1}}}] ); var end = new Date().getTime(); var time = end - start; print(time);
    

    The aggregation is fast on 3.0:

    > db.collection.drop();
    false
    > for (var i = 0; i < 40000; ++i) { db.collection.insert({x: Math.floor(Math.random()*1000000)});}
    WriteResult({ "nInserted" : 1 })
    > db.collection.createIndex({x: 1});
    {
            "createdCollectionAutomatically" : false,
            "numIndexesBefore" : 1,
            "numIndexesAfter" : 2,
            "ok" : 1 
    }
    > var start = new Date().getTime(); db.collection.aggregate( [{$group: {_id: "$x", value: {$sum: 1}}}] ); var end = new Date().getTime(); var time = end - start; print(time);
    44
    

  • MongoDB 3.2 on Windows:
    Run the same commands on a MongoDB 3.2 instance on Windows and it is much slower:

    > db.collection.drop();
    false
    > for (var i = 0; i < 40000; ++i) { db.collection.insert({x: Math.floor(Math.random()*1000000)});}
    WriteResult({ "nInserted" : 1 })
    > db.collection.createIndex({x: 1});
    {
            "createdCollectionAutomatically" : false,
            "numIndexesBefore" : 1,
            "numIndexesAfter" : 2,
            "ok" : 1 
    }
    > var start = new Date().getTime(); db.collection.aggregate( [{$group: {_id: "$x", value: {$sum: 1}}}] ); var end = new Date().getTime(); var time = end - start; print(time);
    26587
    

From the diagnostic data, there is "cursor open pinned" while the aggregation command is run, but I don't see the same on OSX. Is this the cause of the slowness on Windows? Diagnostic data is attached.

We've also tested the same aggregation on MongoDB 3.2 with MMAP storage engine, it is also slow. So this issue doesn't seem to relate to the storage engine.

Also, if I change the data set from:

for (var i = 0; i < 40000; ++i) { db.collection.insert({x: Math.floor(Math.random()*1000000)});}
                                                                                    ^^^^^^^

To:

for (var i = 0; i < 40000; ++i) { db.collection.insert({x: Math.floor(Math.random()*10000)});}
                                                                                    ^^^^^

The aggregation is faster on the second data set (both on MongoDB 3.2 on Windows):

  • First data set

    > db.collection.drop();
    false
    > for (var i = 0; i < 40000; ++i) { db.collection.insert({x: Math.floor(Math.random()*1000000)});}
    WriteResult({ "nInserted" : 1 })
    > db.collection.createIndex({x: 1});
    {
            "createdCollectionAutomatically" : false,
            "numIndexesBefore" : 1,
            "numIndexesAfter" : 2,
            "ok" : 1 
    }
    > var start = new Date().getTime(); db.collection.aggregate( [{$group: {_id: "$x", value: {$sum: 1}}}] ); var end = new Date().getTime(); var time = end - start; print(time);
    26587
    

  • Second data set:

    > db.collection.drop();
    true
    > for (var i = 0; i < 40000; ++i) { db.collection.insert({x: Math.floor(Math.random()*10000)});}
    WriteResult({ "nInserted" : 1 })
    > db.collection.createIndex({x: 1});
    {
            "createdCollectionAutomatically" : false,
            "numIndexesBefore" : 1,
            "numIndexesAfter" : 2,
            "ok" : 1 
    }
    > var start = new Date().getTime(); db.collection.aggregate( [{$group: {_id: "$x", value: {$sum: 1}}}] ); var end = new Date().getTime(); var time = end - start; print(time);
    3020
    

It seems the $group operations would be slow if the result set is large, and this is more obvious on MongoDB 3.2 on Windows.



 Comments   
Comment by Githook User [ 17/Jan/17 ]

Author:

{u'username': u'dstorch', u'name': u'David Storch', u'email': u'david.storch@10gen.com'}

Message: SERVER-25865 use boost::unordered_map in DocumentSourceGroup
Branch: v3.2
https://github.com/mongodb/mongo/commit/87fbfc958a5658bfaed9948db0fc113e9aeab3c9

Comment by Githook User [ 09/Sep/16 ]

Author:

{u'username': u'dstorch', u'name': u'David Storch', u'email': u'david.storch@10gen.com'}

Message: SERVER-25865 stdx::unordered_multimap and stdx::unordered_multiset
Branch: master
https://github.com/mongodb/mongo/commit/bda317e9c852b27f0fe7d148e5c08499d2f8ec49

Comment by Githook User [ 09/Sep/16 ]

Author:

{u'username': u'dstorch', u'name': u'David Storch', u'email': u'david.storch@10gen.com'}

Message: SERVER-25865 stdx::unordered_map and stdx::unordered_set

On Windows, these are aliases for boost containers. On
other platforms they are aliases for std containers.
Branch: master
https://github.com/mongodb/mongo/commit/0f695019bd0b736e0aac0c510290175f0ec8f274

Comment by Githook User [ 09/Sep/16 ]

Author:

{u'username': u'dstorch', u'name': u'David Storch', u'email': u'david.storch@10gen.com'}

Message: SERVER-25865 stdx::unordered_map
Branch: master
https://github.com/10gen/mongo-enterprise-modules/commit/0d6e11030d2f7e668c2a749183b69aa911262c33

Comment by Bruce Lucas (Inactive) [ 01/Sep/16 ]

Could this be affecting other uses of unordered_map on Windows as well?

Comment by David Storch [ 31/Aug/16 ]

I confirmed definitively that the switch to std::unordered_map caused the issue. The issue also continues to affect 3.3.x development versions, despite our upgrade to VS2015. 3.3.x versions compiled with VS2015, just like the 3.2.x versions as reported in this ticket, take more than 20 seconds to execute the $group. I applied the patch below to f41c549e10, a recent version of the master branch somewhere in between 3.3.12 and 3.3.13:

diff --git a/src/mongo/db/pipeline/document_source.h b/src/mongo/db/pipeline/document_source.h
index d8f26fa..70558b1 100644
--- a/src/mongo/db/pipeline/document_source.h
+++ b/src/mongo/db/pipeline/document_source.h
@@ -31,6 +31,7 @@
 #include "mongo/platform/basic.h"
 
 #include <boost/optional.hpp>
+#include <boost/unordered_map.hpp>
 #include <deque>
 #include <list>
 #include <string>
@@ -588,7 +589,8 @@ private:
 class DocumentSourceGroup final : public DocumentSource, public SplittableDocumentSource {
 public:
     using Accumulators = std::vector<boost::intrusive_ptr<Accumulator>>;
-    using GroupsMap = ValueUnorderedMap<Accumulators>;
+    using GroupsMap = boost::
+        unordered_map<Value, Accumulators, ValueComparator::Hasher, ValueComparator::EqualTo>;
 
     // Virtuals from DocumentSource.
     boost::intrusive_ptr<DocumentSource> optimize() final;
diff --git a/src/mongo/db/pipeline/document_source_group.cpp b/src/mongo/db/pipeline/document_source_group.cpp
index c8da8a7..4581bde 100644
--- a/src/mongo/db/pipeline/document_source_group.cpp
+++ b/src/mongo/db/pipeline/document_source_group.cpp
@@ -157,7 +157,8 @@ boost::optional<Document> DocumentSourceGroup::getNextStreaming() {
 
 void DocumentSourceGroup::dispose() {
     // Free our resources.
-    _groups = pExpCtx->getValueComparator().makeUnorderedValueMap<Accumulators>();
+    _groups = GroupsMap(
+        0, pExpCtx->getValueComparator().getHasher(), pExpCtx->getValueComparator().getEqualTo());
     _sorterIterator.reset();
 
     // Make us look done.
@@ -186,7 +187,8 @@ intrusive_ptr<DocumentSource> DocumentSourceGroup::optimize() {
 
 void DocumentSourceGroup::doInjectExpressionContext() {
     // Groups map must respect new comparator.
-    _groups = pExpCtx->getValueComparator().makeUnorderedValueMap<Accumulators>();
+    _groups = GroupsMap(
+        0, pExpCtx->getValueComparator().getHasher(), pExpCtx->getValueComparator().getEqualTo());
 
     for (auto&& idExpr : _idExpressions) {
         idExpr->injectExpressionContext(pExpCtx);
@@ -526,7 +528,9 @@ void DocumentSourceGroup::initialize() {
         }
 
         // We won't be using groups again so free its memory.
-        _groups = pExpCtx->getValueComparator().makeUnorderedValueMap<Accumulators>();
+        _groups = GroupsMap(0,
+                            pExpCtx->getValueComparator().getHasher(),
+                            pExpCtx->getValueComparator().getEqualTo());
 
         _sorterIterator.reset(Sorter<Value, Value>::Iterator::merge(
             sortedFiles, SortOptions(), SorterComparator(pExpCtx->getValueComparator())));

All this patch does is change std::unordered_map back to boost::unordered_map in DocumentSourceGroup. With these changes, the performance increases drastically, back to the sub-100ms times we were seeing on 3.0.

We suspect the root cause is that the VC implementation of std::unordered_map uses a naive policy for obtaining the hash table bucket ID from the hash itself, which leads to a large number of collisions.

Comment by John Murphy [ 31/Aug/16 ]

I have been able to confirm that the $group operation does indeed slow down between 3.1.5 and 3.1.6.

Specifically the build with git version 7e6df189868 ran the aforementioned group aggregation test in 60 ms while the build with git version 932e768dc26 took 23 secs.

Comment by David Storch [ 30/Aug/16 ]

The query team reviewed together and came up with a hypothesis: 932e768dc26 changed the type of the in-memory structure used to construct the groups from boost::unordered_map<> to std::unordered_map<> in 3.1.6. Perhaps boost::unordered_map is faster on Windows than std::unordered_map? In order to confirm we could test this commit and its parent commit in order to determine if this is where the regression was introduced.

Comment by Bruce Lucas (Inactive) [ 30/Aug/16 ]

Possibly of interest: in 3.3.9 and later (but not 3.3.8 and before) there are no FTDC samples collected for the duration of the slow $group operation. See SERVER-25868.

EDIT: the issue on SERVER-25868 is probably unrelated to the slow $group operation, and the coincidence in timing was likely an artifact of the data collection process.

Comment by Bruce Lucas (Inactive) [ 30/Aug/16 ]

Periodic stack trace samples show that during the time the aggregate command is executing, marked by the interval between the blue lines below, it is spending all its time in various places in unordered_map::_Try_emplace.

Generated at Thu Feb 08 04:10:28 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.