Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-14052

With only 2 distinct key values SplitVector Returns numSplits : 1 but no split is done

    • Fully Compatible
    • ALL
    • Hide

      Start a test cluster, small chunk size:

      cluster = new ShardingTest({shards: 2, chunksize: 1});
      

      Connect, create a test database, insert data, manually split to create the "problem" chunk:

      sh.enableSharding("splitTest");
      sh.shardCollection("splitTest.bar", {a : 1});
      for(var i = 0; i < 1000; i++){db.bar.insert({ a : i})};
      // this assumes that the $min --> $max initial chunk was split at mid-point of zero
      sh.splitAt("splitTest.bar", {a : 2});
      

      Now, a chunk should exist with just two distinct shard key values (0 and 1). No matter how many documents I insert, no further splits happen on that chunk (inserted millions of docs).

      The mongos logs for the splitVector look something like this (repeated multiple times):

      m30999| 2014-05-26T16:55:14.857+0100 [Balancer] distributed lock 'balancer/adamc-mbp.local:30999:1401115446:16807' unlocked. 
       m30001| 2014-05-26T16:55:17.665+0100 [conn2] request split points lookup for chunk splitTest.bar { : 0.0 } -->> { : 2.0 }
       m30001| 2014-05-26T16:55:19.798+0100 [conn2] warning: chunk is larger than 1048576 bytes because of key { a: 0.0 }
       m30001| 2014-05-26T16:55:19.798+0100 [conn2] warning: Finding the split vector for splitTest.bar over { a: 1.0 } keyCount: 10922 numSplits: 1 lookedAt: 0 took 2133ms
       m30001| 2014-05-26T16:55:19.798+0100 [conn2] command admin.$cmd command: splitVector { splitVector: "splitTest.bar", keyPattern: { a: 1.0 }, min: { a: 0.0 }, max: { a: 2.0 }, maxChunkSizeBytes: 1048576, maxSplitPoints: 2, maxChunkObjects: 250000 } ntoreturn:1 keyUpdates:0 numYields:2 locks(micros) r:3526041 reslen:88 2133ms
       m30999| 2014-05-26T16:55:20.859+0100 [Balancer] distributed lock 'balancer/adamc-mbp.local:30999:1401115446:16807' acquired, ts : 538363e855d6538b0901712c
       m30000| 2014-05-26T16:55:20.860+0100 [conn9] CMD fsync: sync:1 lock:0
      

      Here's sh.status():

      	splitTest.foobar
      			shard key: { "a" : 1 }
      			chunks:
      				shard0000	1
      				shard0001	2
      			{ "a" : { "$minKey" : 1 } } -->> { "a" : 0 } on : shard0000 Timestamp(2, 0) 
      			{ "a" : 0 } -->> { "a" : 2 } on : shard0001 Timestamp(2, 2) 
      			{ "a" : 2 } -->> { "a" : { "$maxKey" : 1 } } on : shard0001 Timestamp(2, 3) 
      

      A manual split will succeed and create the final possible 2 chunks:

      sh.splitAt("splitTest.foobar", {a : 1})
      { "ok" : 1 }
      splitTest.foobar
      			shard key: { "a" : 1 }
      			chunks:
      				shard0000	2
      				shard0001	2
      			{ "a" : { "$minKey" : 1 } } -->> { "a" : 0 } on : shard0000 Timestamp(2, 0) 
      			{ "a" : 0 } -->> { "a" : 1 } on : shard0001 Timestamp(3, 1) 
      			{ "a" : 1 } -->> { "a" : 2 } on : shard0000 Timestamp(3, 0) 
      			{ "a" : 2 } -->> { "a" : { "$maxKey" : 1 } } on : shard0001 Timestamp(2, 3) 
      
      Show
      Start a test cluster, small chunk size: cluster = new ShardingTest({shards: 2, chunksize: 1}); Connect, create a test database, insert data, manually split to create the "problem" chunk: sh.enableSharding( "splitTest" ); sh.shardCollection( "splitTest.bar" , {a : 1}); for ( var i = 0; i < 1000; i++){db.bar.insert({ a : i})}; // this assumes that the $min --> $max initial chunk was split at mid-point of zero sh.splitAt( "splitTest.bar" , {a : 2}); Now, a chunk should exist with just two distinct shard key values (0 and 1). No matter how many documents I insert, no further splits happen on that chunk (inserted millions of docs). The mongos logs for the splitVector look something like this (repeated multiple times): m30999| 2014-05-26T16:55:14.857+0100 [Balancer] distributed lock 'balancer/adamc-mbp.local:30999:1401115446:16807' unlocked. m30001| 2014-05-26T16:55:17.665+0100 [conn2] request split points lookup for chunk splitTest.bar { : 0.0 } -->> { : 2.0 } m30001| 2014-05-26T16:55:19.798+0100 [conn2] warning: chunk is larger than 1048576 bytes because of key { a: 0.0 } m30001| 2014-05-26T16:55:19.798+0100 [conn2] warning: Finding the split vector for splitTest.bar over { a: 1.0 } keyCount: 10922 numSplits: 1 lookedAt: 0 took 2133ms m30001| 2014-05-26T16:55:19.798+0100 [conn2] command admin.$cmd command: splitVector { splitVector: "splitTest.bar" , keyPattern: { a: 1.0 }, min: { a: 0.0 }, max: { a: 2.0 }, maxChunkSizeBytes: 1048576, maxSplitPoints: 2, maxChunkObjects: 250000 } ntoreturn:1 keyUpdates:0 numYields:2 locks(micros) r:3526041 reslen:88 2133ms m30999| 2014-05-26T16:55:20.859+0100 [Balancer] distributed lock 'balancer/adamc-mbp.local:30999:1401115446:16807' acquired, ts : 538363e855d6538b0901712c m30000| 2014-05-26T16:55:20.860+0100 [conn9] CMD fsync: sync:1 lock:0 Here's sh.status(): splitTest.foobar shard key: { "a" : 1 } chunks: shard0000 1 shard0001 2 { "a" : { "$minKey" : 1 } } -->> { "a" : 0 } on : shard0000 Timestamp(2, 0) { "a" : 0 } -->> { "a" : 2 } on : shard0001 Timestamp(2, 2) { "a" : 2 } -->> { "a" : { "$maxKey" : 1 } } on : shard0001 Timestamp(2, 3) A manual split will succeed and create the final possible 2 chunks: sh.splitAt( "splitTest.foobar" , {a : 1}) { "ok" : 1 } splitTest.foobar shard key: { "a" : 1 } chunks: shard0000 2 shard0001 2 { "a" : { "$minKey" : 1 } } -->> { "a" : 0 } on : shard0000 Timestamp(2, 0) { "a" : 0 } -->> { "a" : 1 } on : shard0001 Timestamp(3, 1) { "a" : 1 } -->> { "a" : 2 } on : shard0000 Timestamp(3, 0) { "a" : 2 } -->> { "a" : { "$maxKey" : 1 } } on : shard0001 Timestamp(2, 3)
    • Sharding 2019-12-30, Sharding 2020-01-13

      This is something of an edge case, and it doesn't gain very much, but figured it is technically still a bug.

      While checking some related logic, I realized that once I created a chunk with just 2 distinct values (0 and 1 in the test case), no splits occurred even though one more split should be possible. A manual splitAt() is successful, so the split is still possible, but the autosplitter never seems to attempt it.

            Assignee:
            tommaso.tocci@mongodb.com Tommaso Tocci
            Reporter:
            adam@comerford.net Adam Comerford
            Votes:
            0 Vote for this issue
            Watchers:
            11 Start watching this issue

              Created:
              Updated:
              Resolved: