Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-20768

Mongos find query including upper bound X of a chunk also targets the shard with chunk having lower bound = X

    • Type: Icon: Bug Bug
    • Resolution: Unresolved
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 2.2.6, 3.0.6, 3.1.8
    • Component/s: Querying, Sharding
    • None
    • Query Optimization
    • ALL
    • Hide

      Start a ShardingTest as:

      // sharding test MUST be started with noAutoSplit
      var st = new ShardingTest({ mongos : 2, shards : 3, rs : { nodes : 3 }, other : { mongosOptions : { noAutoSplit: "" } } })
      

      Then connect to a mongos in the cluster and run the following:

      var db = db.getSiblingDB("test");
      
      // insert some data
      for (var i=0; i<1000; i++) {
          db.foo.insert({ _id : i });
      }
      
      // enable sharding
      db.adminCommand({ enableSharding: db.getName() });
      db.adminCommand({ shardCollection: "test.foo", key: { _id: 1 } });
      
      // create two chunks
      db.adminCommand({ split : "test.foo", middle: { _id : 1000 } });
      
      // put the two chunks on separate shards
      db.adminCommand({ moveChunk : "test.foo", find : { _id : 0 }, to : "test-rs0" });
      db.adminCommand({ moveChunk : "test.foo", find : { _id : 1001 }, to : "test-rs1" });
      
      // shows that both test-rs0 and test-rs1 are targeted
      // even though the chunks are [-inf, 1000) and [1000, +inf)
      printjson(db.foo.find({ _id : { $lt : 1000 } }).explain());
      
      Show
      Start a ShardingTest as: // sharding test MUST be started with noAutoSplit var st = new ShardingTest({ mongos : 2, shards : 3, rs : { nodes : 3 }, other : { mongosOptions : { noAutoSplit: "" } } }) Then connect to a mongos in the cluster and run the following: var db = db.getSiblingDB( "test" ); // insert some data for ( var i=0; i<1000; i++) { db.foo.insert({ _id : i }); } // enable sharding db.adminCommand({ enableSharding: db.getName() }); db.adminCommand({ shardCollection: "test.foo" , key: { _id: 1 } }); // create two chunks db.adminCommand({ split : "test.foo" , middle: { _id : 1000 } }); // put the two chunks on separate shards db.adminCommand({ moveChunk : "test.foo" , find : { _id : 0 }, to : "test-rs0" }); db.adminCommand({ moveChunk : "test.foo" , find : { _id : 1001 }, to : "test-rs1" }); // shows that both test-rs0 and test-rs1 are targeted // even though the chunks are [-inf, 1000) and [1000, +inf) printjson(db.foo.find({ _id : { $lt : 1000 } }).explain());

      Chunks are inclusive at the lower bound and exclusive at the upper bound.

      However, find queries over a range of the form

      { $lt : X }

      where X is the upper bound of a chunk also targets the shard containing the chunk whose lower bound is X (at least according to find().explain()).

      Note that a point query for X will only (and correctly) target the shard with the chunk whose lower bound is X.
      Similarly, a query of the form

      { $lte : X }

      will (correctly) target the shard for both chunks.

      This is undesirable both from a performance perspective, since an additional shard is unnecessarily targeted in this situation, and a testing perspective, since .explain() cannot be used to verify that all documents within a chunk's range lie only on the shard the chunk is expected to be on.

            Assignee:
            backlog-query-optimization [DO NOT USE] Backlog - Query Optimization
            Reporter:
            esha.maharishi@mongodb.com Esha Maharishi (Inactive)
            Votes:
            1 Vote for this issue
            Watchers:
            19 Start watching this issue

              Created:
              Updated: