Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-12878

Aggregations that do not return cursors need to clean up internal cursors

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Fixed
    • Affects Version/s: 2.6.0-rc0
    • Fix Version/s: 2.6.0-rc1
    • Component/s: Performance
    • Labels:
    • Environment:
      Ubuntu 13.10
    • Backwards Compatibility:
      Fully Compatible
    • Operating System:
      Linux
    • Steps To Reproduce:
      Hide
      • start a mongodb 2.4.9 or 2.6.0.rc0 server
      • on the same server,
        • git clone https://github.com/tmcallaghan/sysbench-mongodb.git
        • cd sysbench-mongodb
        • edit run.simple.bash to modify the benchmark behavior
          • modify "export NUM_DOCUMENTS_PER_COLLECTION=10000000", by changing 10 million to "1000000". This will keep the loaded data to under 8GB.
        • Note, you must have ant and Java 1.7 installed to run the benchmark application. If using Java 1.6, change the build.xml file's target="1.7" to target="1.6". You also need to have the MongoDB Java driver in your CLASSPATH, as in "export CLASSPATH=/home/tcallaghan/java_goodies/mongo-2.11.2.jar:.".
        • ./run.simple.bash

      The benchmark runs in two stages:
      1. 16 collections are created, and each is loaded with 1 million documents. Eight loader threads run simultaneously, each with their own collection.
      2. 64 threads run the Sysbench workload

      Both stages output their respective performance.

      • I'm getting around 30K inserts per second for the loader.
      • For the mixed workload I'm getting 550 TPS on 2.4.9, but less than 10 TPS on 2.6.0.rc0
      Show
      start a mongodb 2.4.9 or 2.6.0.rc0 server on the same server, git clone https://github.com/tmcallaghan/sysbench-mongodb.git cd sysbench-mongodb edit run.simple.bash to modify the benchmark behavior modify "export NUM_DOCUMENTS_PER_COLLECTION=10000000", by changing 10 million to "1000000". This will keep the loaded data to under 8GB. Note, you must have ant and Java 1.7 installed to run the benchmark application. If using Java 1.6, change the build.xml file's target="1.7" to target="1.6". You also need to have the MongoDB Java driver in your CLASSPATH, as in "export CLASSPATH=/home/tcallaghan/java_goodies/mongo-2.11.2.jar:.". ./run.simple.bash The benchmark runs in two stages: 1. 16 collections are created, and each is loaded with 1 million documents. Eight loader threads run simultaneously, each with their own collection. 2. 64 threads run the Sysbench workload Both stages output their respective performance. I'm getting around 30K inserts per second for the loader. For the mixed workload I'm getting 550 TPS on 2.4.9, but less than 10 TPS on 2.6.0.rc0

      Description

      Aggregation commands that do not return a cursor need to release the ClientCursor object that they create to do their work. As of 2.6.0-rc0, they do not. This leads to increasing workload for every update and delete operation, as they must scan those zombie cursors and inform them of certain relevant events.

      Original description / steps to reproduce:

      In running benchmarks to compare the performance of 2.4.9 vs. 2.6.0.rc0 I see a significant drop in my Sysbench workload.

      The performance of both versions is comparable for loading data (insert only), but in the mixed workload phase (reads, aggregation, update, delete, insert) the performance of 2.6.0.rc0 is 50x lower on an in-memory test. The performance drop is similar for > RAM testing as well.

      I have confirmed this on my desktop (single Corei7) and a server (dual Xeon 5560), both running Ubuntu 13.10, with read-ahead set to 32.

      I'm running with the Java driver version 2.11.2

      1. prof-0.svg
        108 kB
        Andy Schwerin
      2. prof-8.svg
        106 kB
        Andy Schwerin
      1. SERVER-12878.png
        212 kB
      2. SERVER-12878-249.png
        185 kB
      3. SERVER-12878-249-aggr.png
        58 kB
      4. SERVER-12878-249-qry.png
        40 kB
      5. SERVER-12878-249-readlock.png
        48 kB
      6. SERVER-12878-249-write.png
        152 kB
      7. SERVER-12878-249-writelock.png
        401 kB
      8. SERVER-12878-rc1-aggr.png
        125 kB
      9. SERVER-12878-rc1-qry.png
        148 kB
      10. SERVER-12878-rc1-readlock.png
        286 kB
      11. SERVER-12878-rc1-write.png
        108 kB
      12. SERVER-12878-rc1-writelock.png
        175 kB

        Issue Links

          Activity

          Hide
          schwerin Andy Schwerin added a comment -

          Tim, someone must have done something to improve the query planning problem you were seeing since RC0, if so. Maybe Hari Khalsa's SERVER-12915, maybe some others, as well. With the resource leak patched (commit coming), I see the same performance in 2.6.0-rc1-pre- and 2.4.9. I encourage you to run against RC1, which should be out this week, or off of master after this ticket is resolved.

          For reference this is what I see in 2.4.9 for the last hundred seconds on an Intel Core i7-2600 CPU @ 3.40GHz, with 16GB of RAM and data stored on a passable Intel SSD:

          Thread[Thread-66,5,main]510 seconds : cum tps=636.60 : int tps=706.11 : cum ips=0.00 : int ips=0.00
          Thread[Thread-66,5,main]520 seconds : cum tps=639.59 : int tps=791.79 : cum ips=0.00 : int ips=0.00
          Thread[Thread-66,5,main]530 seconds : cum tps=637.40 : int tps=523.68 : cum ips=0.00 : int ips=0.00
          Thread[Thread-66,5,main]540 seconds : cum tps=633.70 : int tps=437.45 : cum ips=0.00 : int ips=0.00
          Thread[Thread-66,5,main]550 seconds : cum tps=633.70 : int tps=633.99 : cum ips=0.00 : int ips=0.00
          Thread[Thread-66,5,main]560 seconds : cum tps=629.76 : int tps=412.91 : cum ips=0.00 : int ips=0.00
          Thread[Thread-66,5,main]570 seconds : cum tps=631.99 : int tps=756.72 : cum ips=0.00 : int ips=0.00
          Thread[Thread-66,5,main]580 seconds : cum tps=634.03 : int tps=751.69 : cum ips=0.00 : int ips=0.00
          Thread[Thread-66,5,main]590 seconds : cum tps=636.66 : int tps=789.27 : cum ips=0.00 : int ips=0.00
          Thread[Thread-66,5,main]600 seconds : cum tps=631.57 : int tps=330.94 : cum ips=0.00 : int ips=0.00

          And this is what I see for 2.6.0-RC1-pre-:

          Thread[Thread-66,5,main]510 seconds : cum tps=636.79 : int tps=712.50 : cum ips=0.00 : int ips=0.00
          Thread[Thread-66,5,main]520 seconds : cum tps=638.70 : int tps=735.96 : cum ips=0.00 : int ips=0.00
          Thread[Thread-66,5,main]530 seconds : cum tps=632.31 : int tps=300.56 : cum ips=0.00 : int ips=0.00
          Thread[Thread-66,5,main]540 seconds : cum tps=631.20 : int tps=572.19 : cum ips=0.00 : int ips=0.00
          Thread[Thread-66,5,main]550 seconds : cum tps=632.64 : int tps=710.50 : cum ips=0.00 : int ips=0.00
          Thread[Thread-66,5,main]560 seconds : cum tps=631.15 : int tps=549.06 : cum ips=0.00 : int ips=0.00
          Thread[Thread-66,5,main]570 seconds : cum tps=628.61 : int tps=485.06 : cum ips=0.00 : int ips=0.00
          Thread[Thread-66,5,main]580 seconds : cum tps=630.93 : int tps=763.01 : cum ips=0.00 : int ips=0.00
          Thread[Thread-66,5,main]590 seconds : cum tps=632.65 : int tps=732.39 : cum ips=0.00 : int ips=0.00
          Thread[Thread-66,5,main]600 seconds : cum tps=625.19 : int tps=185.55 : cum ips=0.00 : int ips=0.00

          I assume from all of the results (yours and mine) that the last 10 seconds might see some amount of cool-down behavior. I've typically been ignoring it.

          Show
          schwerin Andy Schwerin added a comment - Tim, someone must have done something to improve the query planning problem you were seeing since RC0, if so. Maybe Hari Khalsa 's SERVER-12915 , maybe some others, as well. With the resource leak patched (commit coming), I see the same performance in 2.6.0-rc1-pre- and 2.4.9. I encourage you to run against RC1, which should be out this week, or off of master after this ticket is resolved. For reference this is what I see in 2.4.9 for the last hundred seconds on an Intel Core i7-2600 CPU @ 3.40GHz, with 16GB of RAM and data stored on a passable Intel SSD: Thread[Thread-66,5,main]510 seconds : cum tps=636.60 : int tps=706.11 : cum ips=0.00 : int ips=0.00 Thread[Thread-66,5,main]520 seconds : cum tps=639.59 : int tps=791.79 : cum ips=0.00 : int ips=0.00 Thread[Thread-66,5,main]530 seconds : cum tps=637.40 : int tps=523.68 : cum ips=0.00 : int ips=0.00 Thread[Thread-66,5,main]540 seconds : cum tps=633.70 : int tps=437.45 : cum ips=0.00 : int ips=0.00 Thread[Thread-66,5,main]550 seconds : cum tps=633.70 : int tps=633.99 : cum ips=0.00 : int ips=0.00 Thread[Thread-66,5,main]560 seconds : cum tps=629.76 : int tps=412.91 : cum ips=0.00 : int ips=0.00 Thread[Thread-66,5,main]570 seconds : cum tps=631.99 : int tps=756.72 : cum ips=0.00 : int ips=0.00 Thread[Thread-66,5,main]580 seconds : cum tps=634.03 : int tps=751.69 : cum ips=0.00 : int ips=0.00 Thread[Thread-66,5,main]590 seconds : cum tps=636.66 : int tps=789.27 : cum ips=0.00 : int ips=0.00 Thread[Thread-66,5,main]600 seconds : cum tps=631.57 : int tps=330.94 : cum ips=0.00 : int ips=0.00 And this is what I see for 2.6.0-RC1-pre-: Thread[Thread-66,5,main]510 seconds : cum tps=636.79 : int tps=712.50 : cum ips=0.00 : int ips=0.00 Thread[Thread-66,5,main]520 seconds : cum tps=638.70 : int tps=735.96 : cum ips=0.00 : int ips=0.00 Thread[Thread-66,5,main]530 seconds : cum tps=632.31 : int tps=300.56 : cum ips=0.00 : int ips=0.00 Thread[Thread-66,5,main]540 seconds : cum tps=631.20 : int tps=572.19 : cum ips=0.00 : int ips=0.00 Thread[Thread-66,5,main]550 seconds : cum tps=632.64 : int tps=710.50 : cum ips=0.00 : int ips=0.00 Thread[Thread-66,5,main]560 seconds : cum tps=631.15 : int tps=549.06 : cum ips=0.00 : int ips=0.00 Thread[Thread-66,5,main]570 seconds : cum tps=628.61 : int tps=485.06 : cum ips=0.00 : int ips=0.00 Thread[Thread-66,5,main]580 seconds : cum tps=630.93 : int tps=763.01 : cum ips=0.00 : int ips=0.00 Thread[Thread-66,5,main]590 seconds : cum tps=632.65 : int tps=732.39 : cum ips=0.00 : int ips=0.00 Thread[Thread-66,5,main]600 seconds : cum tps=625.19 : int tps=185.55 : cum ips=0.00 : int ips=0.00 I assume from all of the results (yours and mine) that the last 10 seconds might see some amount of cool-down behavior. I've typically been ignoring it.
          Hide
          xgen-internal-githook Githook User added a comment -

          Author:

          {u'username': u'RedBeard0531', u'name': u'Mathias Stearn', u'email': u'mathias@10gen.com'}

          Message: SERVER-12878 Delete ClientCursor in agg unless returning a cursor to user.
          Branch: master
          https://github.com/mongodb/mongo/commit/8d43b5cb9949c16452cb8d949c89d94cab9c8bad

          Show
          xgen-internal-githook Githook User added a comment - Author: {u'username': u'RedBeard0531', u'name': u'Mathias Stearn', u'email': u'mathias@10gen.com'} Message: SERVER-12878 Delete ClientCursor in agg unless returning a cursor to user. Branch: master https://github.com/mongodb/mongo/commit/8d43b5cb9949c16452cb8d949c89d94cab9c8bad
          Hide
          tmcallaghan Tim Callaghan added a comment -

          Andy, I agree, and I think that's what Hari originally did for this ticket in this commit.

          https://github.com/mongodb/mongo/commit/a3d910925350d2f0204b41ea145e24f74e5c39ce

          Show
          tmcallaghan Tim Callaghan added a comment - Andy, I agree, and I think that's what Hari originally did for this ticket in this commit. https://github.com/mongodb/mongo/commit/a3d910925350d2f0204b41ea145e24f74e5c39ce
          Hide
          xgen-internal-githook Githook User added a comment -

          Author:

          {u'username': u'hkhalsa', u'name': u'Hari Khalsa', u'email': u'hkhalsa@10gen.com'}

          Message: SERVER-10026 SERVER-13271 SERVER-12878 SERVER-2094 avoid surplus projections in distinct
          Branch: master
          https://github.com/mongodb/mongo/commit/267f56a7e0ce36eba21b4b2ef09e32a43370acbf

          Show
          xgen-internal-githook Githook User added a comment - Author: {u'username': u'hkhalsa', u'name': u'Hari Khalsa', u'email': u'hkhalsa@10gen.com'} Message: SERVER-10026 SERVER-13271 SERVER-12878 SERVER-2094 avoid surplus projections in distinct Branch: master https://github.com/mongodb/mongo/commit/267f56a7e0ce36eba21b4b2ef09e32a43370acbf
          Hide
          xgen-internal-githook Githook User added a comment -

          Author:

          {u'username': u'hkhalsa', u'name': u'Hari Khalsa', u'email': u'hkhalsa@10gen.com'}

          Message: SERVER-10026 SERVER-13271 SERVER-12878 SERVER-2094 avoid surplus projections in distinct
          Branch: v2.6
          https://github.com/mongodb/mongo/commit/094f1565d6f82859bc38300b45564dd1ea9f070e

          Show
          xgen-internal-githook Githook User added a comment - Author: {u'username': u'hkhalsa', u'name': u'Hari Khalsa', u'email': u'hkhalsa@10gen.com'} Message: SERVER-10026 SERVER-13271 SERVER-12878 SERVER-2094 avoid surplus projections in distinct Branch: v2.6 https://github.com/mongodb/mongo/commit/094f1565d6f82859bc38300b45564dd1ea9f070e

            People

            • Votes:
              0 Vote for this issue
              Watchers:
              23 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:
                Days since reply:
                1 year, 5 weeks, 4 days ago
                Date of 1st Reply: