Core Server
  1. Core Server
  2. SERVER-5063

$in on first compound key element and limit efficiency

    Details

    • Type: Improvement Improvement
    • Status: Closed Closed
    • Priority: Major - P3 Major - P3
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.1.1
    • Component/s: Querying
    • Labels:
      None
    • Backport:
      No
    • # Replies:
      7
    • Last comment by Customer:
      true
    • Documentation changes needed?:
      Yes

      Description

      If you have an index on ( a , b ) and do a query like find( { a :

      { $in : [ ... }

      } ).sort(

      { b : 1 }

      ).limit(10)
      an easy optimization is limiting the results for each $in element to the limit.

        Issue Links

          Activity

          Hide
          auto
          added a comment -

          Author:

          {u'login': u'astaple', u'name': u'Aaron', u'email': u'aaron@10gen.com'}

          Message: SERVER-5063 $in limit efficiency.
          Branch: master
          https://github.com/mongodb/mongo/commit/7cacde89bb37051549b38abaf4d6b591b199b243

          Show
          auto
          added a comment - Author: {u'login': u'astaple', u'name': u'Aaron', u'email': u'aaron@10gen.com'} Message: SERVER-5063 $in limit efficiency. Branch: master https://github.com/mongodb/mongo/commit/7cacde89bb37051549b38abaf4d6b591b199b243
          Hide
          James Smith
          added a comment -

          Out of interest, what was the speed difference for $in queries of this type after this change, do you have any benchmark results?

          Show
          James Smith
          added a comment - Out of interest, what was the speed difference for $in queries of this type after this change, do you have any benchmark results?
          Hide
          Aaron Staple (Inactive)
          added a comment -

          Hi James,

          The performance improvement you'll see will be very dependent on your data distribution, and it's possible to generate a benchmark with an arbitrary improvement multiple by varying the amount of data in your test. (For example I just ran a simple test for this optimization that showed a four order of magnitude performance improvement.)

          Is there a real world example data set you'd like to test?

          Show
          Aaron Staple (Inactive)
          added a comment - Hi James, The performance improvement you'll see will be very dependent on your data distribution, and it's possible to generate a benchmark with an arbitrary improvement multiple by varying the amount of data in your test. (For example I just ran a simple test for this optimization that showed a four order of magnitude performance improvement.) Is there a real world example data set you'd like to test?
          Hide
          Artur Rodrigues
          added a comment - - edited

          I would just like to share my experience, where I have the following indexes:

          "indexSizes" :

          { "_id_" : 9026304, "created_at_-1" : 6344576, "feelings_1_location.state_1_created_at_-1" : 10285408 }

          ,

          The query:
          db.dev06.find({
          'feelings':

          { $in: [ 'cansado', 'triste' ] }

          ,
          'location.state' :

          { $in: [ 'Minas Gerais', 'Rio de Janeiro' ] }

          }).sort(

          {'created_at': -1}

          ).limit(10).explain();

          In v2.0.6 the results were:
          http://pastie.org/private/frfkmfuli7uqi6fwiykag

          while in v2.1.2 there was a significant increase in the number of objects scanned:
          http://pastie.org/private/6dtbsxb6vofzh8yslmicca

          Even if I hint in v.2.0.6 the same index used in v2.1.2, it still scans less documents:
          http://pastie.org/private/ejgiavgf1reqlq5hwhgpw

          Show
          Artur Rodrigues
          added a comment - - edited I would just like to share my experience, where I have the following indexes: "indexSizes" : { "_id_" : 9026304, "created_at_-1" : 6344576, "feelings_1_location.state_1_created_at_-1" : 10285408 } , The query: db.dev06.find({ 'feelings': { $in: [ 'cansado', 'triste' ] } , 'location.state' : { $in: [ 'Minas Gerais', 'Rio de Janeiro' ] } }).sort( {'created_at': -1} ).limit(10).explain(); In v2.0.6 the results were: http://pastie.org/private/frfkmfuli7uqi6fwiykag while in v2.1.2 there was a significant increase in the number of objects scanned: http://pastie.org/private/6dtbsxb6vofzh8yslmicca Even if I hint in v.2.0.6 the same index used in v2.1.2, it still scans less documents: http://pastie.org/private/ejgiavgf1reqlq5hwhgpw
          Hide
          Aaron Staple (Inactive)
          added a comment - - edited

          Hi Artur,

          I think you may be encountering a difference the explain behavior between 2.0 and 2.1. In 2.1 the nscanned reported for the overall query includes the nscanned values of all candidate plans attempted. In 2.0 only the nscanned of the "winning" query plan is reported.

          Could you please send the output of:

          db.dev06.find( ... ).explain( true )
          db.dev06.find( ... ).hint(

          { feelings:1, 'location.state':1, created_at:-1 }

          ).explain( true )

          for both 2.0 and 2.1? Also, it would be helpful for clarity if you could include the command you enter into the shell with the shell output you post.

          Thanks,
          Aaron

          Show
          Aaron Staple (Inactive)
          added a comment - - edited Hi Artur, I think you may be encountering a difference the explain behavior between 2.0 and 2.1. In 2.1 the nscanned reported for the overall query includes the nscanned values of all candidate plans attempted. In 2.0 only the nscanned of the "winning" query plan is reported. Could you please send the output of: db.dev06.find( ... ).explain( true ) db.dev06.find( ... ).hint( { feelings:1, 'location.state':1, created_at:-1 } ).explain( true ) for both 2.0 and 2.1? Also, it would be helpful for clarity if you could include the command you enter into the shell with the shell output you post. Thanks, Aaron
          Hide
          Artur Rodrigues
          added a comment -

          Hi Aaron,

          Thanks for your prompt reply. Please find the information you asked for:
          v2.0.6
          http://pastie.org/private/dhsoanhnzlvgtkx71o9nq

          v2.1.2
          http://pastie.org/private/0a33rnyxxqeukdwbevurw

          Indeed, as you said, v2.1 is reporting the nscanned values of all candidates. My fault for not reading the changelog throughly.

          Anyways, I was expecting a performance boost in v2.1 for this specific query given the compound index, but I think I'll have to wait until SERVER-3310 is solved.

          Thank you once again!

          Show
          Artur Rodrigues
          added a comment - Hi Aaron, Thanks for your prompt reply. Please find the information you asked for: v2.0.6 http://pastie.org/private/dhsoanhnzlvgtkx71o9nq v2.1.2 http://pastie.org/private/0a33rnyxxqeukdwbevurw Indeed, as you said, v2.1 is reporting the nscanned values of all candidates. My fault for not reading the changelog throughly. Anyways, I was expecting a performance boost in v2.1 for this specific query given the compound index, but I think I'll have to wait until SERVER-3310 is solved. Thank you once again!
          Hide
          Aaron Staple (Inactive)
          added a comment -

          Hi Artur,

          No problem. It looks like your 'feelings_1_location.state_1_created_at_-1' index is multikey, which would explain why the optimization is not being used. (It is not supported for multikey indexes.)

          Show
          Aaron Staple (Inactive)
          added a comment - Hi Artur, No problem. It looks like your 'feelings_1_location.state_1_created_at_-1' index is multikey, which would explain why the optimization is not being used. (It is not supported for multikey indexes.)

            People

            • Votes:
              2 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:
                Days since reply:
                1 year, 41 weeks, 6 days ago
                Date of 1st Reply: