Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-21628

mongos sort query using $in creating inefficient query plan compared to $or

    • Type: Icon: Bug Bug
    • Resolution: Done
    • Priority: Icon: Major - P3 Major - P3
    • 3.3.4
    • Affects Version/s: None
    • Component/s: Querying, Sharding
    • Labels:
      None
    • Fully Compatible
    • ALL
    • Hide

      Setup mongos with 2 shards and run the following in "test" database.

      for (var i = 1; i <= 1000; i++) { 
          db.items.insert({item: "a", i_type: "x", i_id: i, price: i * 50});
          db.items.insert({item: "a", i_type: "y", i_id: i, price: i * 50});
          db.items.insert({item: "b", i_type: "x", i_id: i, price: i * 50});
          db.items.insert({item: "b", i_type: "y", i_id: i, price: i * 50});
      }
      db.items.createIndex({item: 1, i_type: 1, i_id: 1}, {unique: true})
      db.items.createIndex({item: 1, i_type: 1, price: 1})
      sh.enableSharding("test")
      sh.shardCollection("test.items", {item: 1, i_type: 1})
      
      db.items.find(
          {item: "a", i_type: {$in: ["x", "y"]}},
          {_id: 0, item: 1, i_type: 1}
      ).sort({price: 1}).limit(10).explain("executionStats")
      
      db.items.find(
          {$or: [{item: "a", i_type: "x"}, {item: "a", i_type: "y"}]},
          {_id: 0, item: 1, i_type: 1}
      ).sort({price: 1}).limit(10).explain("executionStats")
      
      Show
      Setup mongos with 2 shards and run the following in "test" database. for ( var i = 1; i <= 1000; i++) { db.items.insert({item: "a" , i_type: "x" , i_id: i, price: i * 50}); db.items.insert({item: "a" , i_type: "y" , i_id: i, price: i * 50}); db.items.insert({item: "b" , i_type: "x" , i_id: i, price: i * 50}); db.items.insert({item: "b" , i_type: "y" , i_id: i, price: i * 50}); } db.items.createIndex({item: 1, i_type: 1, i_id: 1}, {unique: true }) db.items.createIndex({item: 1, i_type: 1, price: 1}) sh.enableSharding( "test" ) sh.shardCollection( "test.items" , {item: 1, i_type: 1}) db.items.find( {item: "a" , i_type: {$in: [ "x" , "y" ]}}, {_id: 0, item: 1, i_type: 1} ).sort({price: 1}).limit(10).explain( "executionStats" ) db.items.find( {$or: [{item: "a" , i_type: "x" }, {item: "a" , i_type: "y" }]}, {_id: 0, item: 1, i_type: 1} ).sort({price: 1}).limit(10).explain( "executionStats" )
    • QuInt E (01/11/16), Query F (02/01/16), Query 10 (02/22/16), Query 12 (04/04/16)

      $in queries are performing an unnecessary/inefficient SORT stage compared to $or using a SORT_MERGE for a logically equivalent query in mongos.

      With $in examines 2000 keys.

      db.items.find(
          {item: "a", i_type: {$in: ["x", "y"]}},
          {_id: 0, item: 1, i_type: 1, price: 1}
      ).sort({price: 1}).limit(10).explain("executionStats")
      

      With $or examines 11 keys.

      db.items.find(
          {$or: [{item: "a", i_type: "x"}, {item: "a", i_type: "y"}]},
          {_id: 0, item: 1, i_type: 1, price: 1}
      ).sort({price: 1}).limit(10).explain("executionStats")
      

      Additional queries I've tested:

      Without sort examines 10 keys.

      db.items.find(
          {item: "a", i_type: {$in: ["x", "y"]}},
          {_id: 0, item: 1, i_type: 1, price: 1}
      ).limit(10).explain("executionStats")
      

      With only one i_type examines 10 keys.

      db.items.find(
          {item: "a", i_type: {$in: ["x"]}},
          {_id: 0, item: 1, i_type: 1, price: 1}
      ).sort({price: 1}).limit(10).explain("executionStats")
      

      Run directly on replica set examines 11 keys.

      db.items.find(
          {item: "a", i_type: {$in: ["x", "y"]}},
          {_id: 0, item: 1, i_type: 1, price: 1}
      ).sort({price: 1}).limit(10).explain("executionStats")
      

            Assignee:
            david.storch@mongodb.com David Storch
            Reporter:
            axom Brian Riley
            Votes:
            1 Vote for this issue
            Watchers:
            8 Start watching this issue

              Created:
              Updated:
              Resolved: