Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-13487

Incorrect mongos shard routing for some range based finds on single shard key

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Minor - P4
    • Resolution: Duplicate
    • Affects Version/s: 2.6.0-rc2
    • Fix Version/s: None
    • Component/s: Sharding
    • Labels:
      None
    • Operating System:
      ALL
    • Steps To Reproduce:
      Hide
      • Create a sharded DB environment - 3 shards (just one mongod per shard), 1 config server and 1 mongos. Important to have more than 2 shards to easily distinguish the erroneous 2 shard routing behaviour from a general scatter-gather to all 3 shards behaviour. Using mongo shell, via mongos ensure you have called addShard() for each of the 3 shards
      • Load up a database (eg. 'test') with a collection of randomly generated documents - 2 million small docs should be sufficient to allow some chunking to occur.

       sh.enableSharding('test')
       use test
       db.rawdata.ensureIndex({myfield: 1})
       sh.shardCollection('test.rawdata', {myfield: 1})
       for (var i = 0; i < 2000000; ++i) {
         db.rawdata.insert({myfield: Math.floor(Math.random()*10000)});
       }

      • See what chunks are on what shards

      sh.status()

      • Run a query on a subset of data that lives on just once shard (eg. choose one chunk start and end value from the chunk list above and just query that), IMPORTANT - choose a value where the next chunk after it lives on different shard <-- THIS IS KEY

         // Change the 2 values as appropriate
         db.rawdata.find({"myfield" : {"$gte" : 80, "$lt" : 1059}}).explain()

      • Notice in the explain output, 2 shards are being targeted when should only need to target one, because the shard key range used in the find() belongs to just one shard
      • Modify the find to use $lte and decrement the value by 1

         // Change the 2 values as appropriate
         db.rawdata.find({"myfield" : {"$gte" : 80, "$lte" : 1058}}).explain()

      • This time notice that now only one shard is correctly being targeted in the explain plan
      • The aggregation framework exhibits the same behaviour.

         // Change the 2 values as appropriate
         var pipeline = [{"$match" : {"myfield" : {"$gte" : 80, "$lt" : 1059}}}]
         db.rawdata.aggregate(pipeline, {explain: true});

      Show
      Create a sharded DB environment - 3 shards (just one mongod per shard), 1 config server and 1 mongos. Important to have more than 2 shards to easily distinguish the erroneous 2 shard routing behaviour from a general scatter-gather to all 3 shards behaviour. Using mongo shell, via mongos ensure you have called addShard() for each of the 3 shards Load up a database (eg. 'test') with a collection of randomly generated documents - 2 million small docs should be sufficient to allow some chunking to occur. sh.enableSharding( 'test' ) use test db.rawdata.ensureIndex({myfield: 1}) sh.shardCollection( 'test.rawdata' , {myfield: 1}) for ( var i = 0; i < 2000000; ++i) { db.rawdata.insert({myfield: Math.floor(Math.random()*10000)}); } See what chunks are on what shards sh.status() Run a query on a subset of data that lives on just once shard (eg. choose one chunk start and end value from the chunk list above and just query that), IMPORTANT - choose a value where the next chunk after it lives on different shard <-- THIS IS KEY // Change the 2 values as appropriate db.rawdata.find({ "myfield" : { "$gte" : 80, "$lt" : 1059}}).explain() Notice in the explain output, 2 shards are being targeted when should only need to target one, because the shard key range used in the find() belongs to just one shard Modify the find to use $lte and decrement the value by 1 // Change the 2 values as appropriate db.rawdata.find({ "myfield" : { "$gte" : 80, "$lte" : 1058}}).explain() This time notice that now only one shard is correctly being targeted in the explain plan The aggregation framework exhibits the same behaviour. // Change the 2 values as appropriate var pipeline = [{ "$match" : { "myfield" : { "$gte" : 80, "$lt" : 1059}}}] db.rawdata.aggregate(pipeline, {explain: true });

      Description

      MongoDB version: 2.6.0rc2
      OS: Linux x86-64

      When running a query against a sharded collection using a range in the find() criteria based on values in the shard key, depending on the range of values used, the mongos / query optimiser is incorrectly targeting 2 shards when it only needs to target 1 shard. Specifically, the problem is for a sharded collection that has a single (non-compound) shard key, and $lte (or $gte) is used in the range query. The exact behaviour can be seen when running .explain().

      Say I have a sharded collection on shard key "myfield" and in shard s1 a chunk exists for {"myfield":80}-->>{"myfield":1059} and then in shard s0 a chunk exists for {"myfield":1059}-->>{"myfield":2492}, then when I issue a find() with criteria "$lt: 1059", both shards s1 and s0 are routed to when only one shard (s1) needs to be routed to.

      As a workaround, if I modify the criteria to be "$lte: 1058", then only one shard (s1) is correctly routed to. HOWEVER, this means my generic query code for my more real world application has to be explicitly aware of field types (no schema) and the knowledge that the field actually contains an integer that can be possible decremented by 1, just to try to optimise the sharded query.

        Issue Links

          Activity

          Hide
          adamc Adam Comerford (Inactive) added a comment - - edited

          I've reproduced this, the root of the problem seems to be that the index bounds being produced by $lt and $lte (and $gt, $gte) are identical.

          For example, if you specify criteria {$gte : 2000, $lte : 3000} you get index bounds of [2000, 3000]. However, if you specify criteria {$gt : 2000, $lt : 3000} you get the identical index bounds of [2000, 3000] rather than the expected [2001, 2999] based on the previous result. Hence when querying for ranges that coincide with the chunk boundaries you can unexpectedly hit more chunks (and hence shards) than you would expect.

          Show
          adamc Adam Comerford (Inactive) added a comment - - edited I've reproduced this, the root of the problem seems to be that the index bounds being produced by $lt and $lte (and $gt, $gte) are identical. For example, if you specify criteria { $gte : 2000, $lte : 3000 } you get index bounds of [2000, 3000] . However, if you specify criteria { $gt : 2000, $lt : 3000 } you get the identical index bounds of [2000, 3000] rather than the expected [2001, 2999] based on the previous result. Hence when querying for ranges that coincide with the chunk boundaries you can unexpectedly hit more chunks (and hence shards) than you would expect.
          Hide
          siyuan.zhou Siyuan Zhou added a comment -

          Yes. It is because the post-process of the index bounds doesn't respect inclusive/exclusive. This is an known issue. I am closing it as a dup.

          Show
          siyuan.zhou Siyuan Zhou added a comment - Yes. It is because the post-process of the index bounds doesn't respect inclusive/exclusive. This is an known issue. I am closing it as a dup.

            People

            • Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: