Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-25865

$group operation is slow since MongoDB 3.2 on Windows

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Fixed
    • Affects Version/s: 3.2.9, 3.3.12
    • Fix Version/s: 3.2.12, 3.3.14
    • Component/s: Aggregation Framework
    • Labels:
      None
    • Backwards Compatibility:
      Fully Compatible
    • Operating System:
      ALL
    • Backport Requested:
      v3.2
    • Sprint:
      Query 2016-09-19

      Description

      The $group operation is much slower for MongoDB 3.2/3.3 comparing to MongoDB 3.0 on Windows. I don't see the issue on OSX or Linux.

      • MongoDB 3.0 on Windows:
        Run the following commands to create the collection, index and then run the aggregation.

        use test;
        db.collection.drop();
        for (var i = 0; i < 40000; ++i) { db.collection.insert({x: Math.floor(Math.random()*1000000)});}
        db.collection.createIndex({x: 1});
        var start = new Date().getTime(); db.collection.aggregate( [{$group: {_id: "$x", value: {$sum: 1}}}] ); var end = new Date().getTime(); var time = end - start; print(time);
        

        The aggregation is fast on 3.0:

        > db.collection.drop();
        false
        > for (var i = 0; i < 40000; ++i) { db.collection.insert({x: Math.floor(Math.random()*1000000)});}
        WriteResult({ "nInserted" : 1 })
        > db.collection.createIndex({x: 1});
        {
                "createdCollectionAutomatically" : false,
                "numIndexesBefore" : 1,
                "numIndexesAfter" : 2,
                "ok" : 1 
        }
        > var start = new Date().getTime(); db.collection.aggregate( [{$group: {_id: "$x", value: {$sum: 1}}}] ); var end = new Date().getTime(); var time = end - start; print(time);
        44
        

      • MongoDB 3.2 on Windows:
        Run the same commands on a MongoDB 3.2 instance on Windows and it is much slower:

        > db.collection.drop();
        false
        > for (var i = 0; i < 40000; ++i) { db.collection.insert({x: Math.floor(Math.random()*1000000)});}
        WriteResult({ "nInserted" : 1 })
        > db.collection.createIndex({x: 1});
        {
                "createdCollectionAutomatically" : false,
                "numIndexesBefore" : 1,
                "numIndexesAfter" : 2,
                "ok" : 1 
        }
        > var start = new Date().getTime(); db.collection.aggregate( [{$group: {_id: "$x", value: {$sum: 1}}}] ); var end = new Date().getTime(); var time = end - start; print(time);
        26587
        

      From the diagnostic data, there is "cursor open pinned" while the aggregation command is run, but I don't see the same on OSX. Is this the cause of the slowness on Windows? Diagnostic data is attached.

      We've also tested the same aggregation on MongoDB 3.2 with MMAP storage engine, it is also slow. So this issue doesn't seem to relate to the storage engine.

      Also, if I change the data set from:

      for (var i = 0; i < 40000; ++i) { db.collection.insert({x: Math.floor(Math.random()*1000000)});}
                                                                                          ^^^^^^^
      

      To:

      for (var i = 0; i < 40000; ++i) { db.collection.insert({x: Math.floor(Math.random()*10000)});}
                                                                                          ^^^^^
      

      The aggregation is faster on the second data set (both on MongoDB 3.2 on Windows):

      • First data set

        > db.collection.drop();
        false
        > for (var i = 0; i < 40000; ++i) { db.collection.insert({x: Math.floor(Math.random()*1000000)});}
        WriteResult({ "nInserted" : 1 })
        > db.collection.createIndex({x: 1});
        {
                "createdCollectionAutomatically" : false,
                "numIndexesBefore" : 1,
                "numIndexesAfter" : 2,
                "ok" : 1 
        }
        > var start = new Date().getTime(); db.collection.aggregate( [{$group: {_id: "$x", value: {$sum: 1}}}] ); var end = new Date().getTime(); var time = end - start; print(time);
        26587
        

      • Second data set:

        > db.collection.drop();
        true
        > for (var i = 0; i < 40000; ++i) { db.collection.insert({x: Math.floor(Math.random()*10000)});}
        WriteResult({ "nInserted" : 1 })
        > db.collection.createIndex({x: 1});
        {
                "createdCollectionAutomatically" : false,
                "numIndexesBefore" : 1,
                "numIndexesAfter" : 2,
                "ok" : 1 
        }
        > var start = new Date().getTime(); db.collection.aggregate( [{$group: {_id: "$x", value: {$sum: 1}}}] ); var end = new Date().getTime(); var time = end - start; print(time);
        3020
        

      It seems the $group operations would be slow if the result set is large, and this is more obvious on MongoDB 3.2 on Windows.

      1. diagnostic.data.tar
        25 kB
        Linda Qin
      1. aggregation_3.2_windows.png
        29 kB
      2. group.png
        271 kB

        Issue Links

          Activity

          Hide
          bruce.lucas Bruce Lucas added a comment -

          Could this be affecting other uses of unordered_map on Windows as well?

          Show
          bruce.lucas Bruce Lucas added a comment - Could this be affecting other uses of unordered_map on Windows as well?
          Hide
          xgen-internal-githook Githook User added a comment -

          Author:

          {u'username': u'dstorch', u'name': u'David Storch', u'email': u'david.storch@10gen.com'}

          Message: SERVER-25865 stdx::unordered_map
          Branch: master
          https://github.com/10gen/mongo-enterprise-modules/commit/0d6e11030d2f7e668c2a749183b69aa911262c33

          Show
          xgen-internal-githook Githook User added a comment - Author: {u'username': u'dstorch', u'name': u'David Storch', u'email': u'david.storch@10gen.com'} Message: SERVER-25865 stdx::unordered_map Branch: master https://github.com/10gen/mongo-enterprise-modules/commit/0d6e11030d2f7e668c2a749183b69aa911262c33
          Hide
          xgen-internal-githook Githook User added a comment -

          Author:

          {u'username': u'dstorch', u'name': u'David Storch', u'email': u'david.storch@10gen.com'}

          Message: SERVER-25865 stdx::unordered_map and stdx::unordered_set

          On Windows, these are aliases for boost containers. On
          other platforms they are aliases for std containers.
          Branch: master
          https://github.com/mongodb/mongo/commit/0f695019bd0b736e0aac0c510290175f0ec8f274

          Show
          xgen-internal-githook Githook User added a comment - Author: {u'username': u'dstorch', u'name': u'David Storch', u'email': u'david.storch@10gen.com'} Message: SERVER-25865 stdx::unordered_map and stdx::unordered_set On Windows, these are aliases for boost containers. On other platforms they are aliases for std containers. Branch: master https://github.com/mongodb/mongo/commit/0f695019bd0b736e0aac0c510290175f0ec8f274
          Hide
          xgen-internal-githook Githook User added a comment -

          Author:

          {u'username': u'dstorch', u'name': u'David Storch', u'email': u'david.storch@10gen.com'}

          Message: SERVER-25865 stdx::unordered_multimap and stdx::unordered_multiset
          Branch: master
          https://github.com/mongodb/mongo/commit/bda317e9c852b27f0fe7d148e5c08499d2f8ec49

          Show
          xgen-internal-githook Githook User added a comment - Author: {u'username': u'dstorch', u'name': u'David Storch', u'email': u'david.storch@10gen.com'} Message: SERVER-25865 stdx::unordered_multimap and stdx::unordered_multiset Branch: master https://github.com/mongodb/mongo/commit/bda317e9c852b27f0fe7d148e5c08499d2f8ec49
          Hide
          xgen-internal-githook Githook User added a comment -

          Author:

          {u'username': u'dstorch', u'name': u'David Storch', u'email': u'david.storch@10gen.com'}

          Message: SERVER-25865 use boost::unordered_map in DocumentSourceGroup
          Branch: v3.2
          https://github.com/mongodb/mongo/commit/87fbfc958a5658bfaed9948db0fc113e9aeab3c9

          Show
          xgen-internal-githook Githook User added a comment - Author: {u'username': u'dstorch', u'name': u'David Storch', u'email': u'david.storch@10gen.com'} Message: SERVER-25865 use boost::unordered_map in DocumentSourceGroup Branch: v3.2 https://github.com/mongodb/mongo/commit/87fbfc958a5658bfaed9948db0fc113e9aeab3c9

            People

            • Votes:
              0 Vote for this issue
              Watchers:
              15 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                  Agile