Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-12554

Count queries negatively impact concurrency

    • Type: Icon: Bug Bug
    • Resolution: Done
    • Priority: Icon: Major - P3 Major - P3
    • 2.6.0-rc0
    • Affects Version/s: 2.5.5
    • Component/s: Concurrency, Querying
    • Labels:
    • Environment:
      Linux ip-10-232-37-146.ec2.internal 2.6.32-358.14.1.el6.x86_64 #1 SMP Mon Jun 17 15:54:20 EDT 2013 x86_64 x86_64 x86_64 GNU/Linux
    • ALL

      My query concurrency tests showed up a clear regression when different kind of queries are executed concurrently.
      I relied on git bisect to find out where the problem was, starting from

      Good revision: 06851f4859a61019a17147ea96c4e6d9462f354b
      Bad revision: f2ece186b189b2b9d09636c708f768fb3822b511
      

      The commit that caused the regression is:

      # git bisect good 2bee3f018b8d4351f8261c405adcdff44c7f9a70
      3b3c25e571852893373ae2e2b361397b260687c9 is the first bad commit
      commit 3b3c25e571852893373ae2e2b361397b260687c9
      Author: Hari Khalsa <hkhalsa@10gen.com>
      Date:   Mon Jan 27 18:01:26 2014 -0500
      
          SERVER-12460 faster count for simple queries
      

      I more carefully looked at the server logs and realized there's something unusual:

      2014-01-30T14:48:52.824-0500 [conn133] command test1.$cmd command: { count: "QA408", query: { timestamp: { $gte: new Date(1388927893439), $lte: new Date(1388999113439) } }, limit: 1 } keyUpdates:0 numYields:22 locks(micros) r:5722 reslen:48 111ms
      2014-01-30T14:48:52.835-0500 [conn129] command test1.$cmd command: { count: "QA408", query: { timestamp: { $gte: new Date(1389166093427), $lte: new Date(1389327733427) } }, limit: 1 } keyUpdates:0 numYields:50 locks(micros) r:6720 reslen:48 129ms
      2014-01-30T14:48:52.847-0500 [conn119] command test1.$cmd command: { count: "QA408", query: { timestamp: { $gte: new Date(1390298233437), $lte: new Date(1390424353437) } }, limit: 1 } keyUpdates:0 numYields:40 locks(micros) r:6359 reslen:48 136ms
      

      Lots of yields – apparently this didn't happen before the change.
      CPU profiling via gperftools confirmsthe suspects, FWIW.

      (pprof) list mongo::PlanExecutor::getNext
      [...]
           .      .   79:         for (;;) {
           .      .   80:             // Yield, if we can yield ourselves.
          76    337   81:             if (NULL != _yieldPolicy.get() && _yieldPolicy->shouldYield()) {
           .    166   82:                 saveState();
          23   4884   83:                 _yieldPolicy->yield();
           4      8   84:                 if (_killed) { return Runner::RUNNER_DEAD; }
           .    765   85:                 restoreState();
           .      .   86:             }
      [...]
      

            Assignee:
            davide.italiano Davide Italiano
            Reporter:
            davide.italiano Davide Italiano
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated:
              Resolved: