Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-16582

Chunk Migration Failing Repeatedly on Initial Balancing Round

    • Type: Icon: Bug Bug
    • Resolution: Done
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 2.8.0-rc2
    • Component/s: Sharding
    • Labels:
      None
    • Fully Compatible
    • ALL
    • Hide
      • At bash prompt:
        mlaunch init --sharded 3 --replicaset
        mongo
        

        Note: I'm using mlaunch version 1.1.6

      • At the mongo shell:
        for (i=1; i<=1000; i++ ) {x = []; for (j=1; j<=1000; j++) {x.push( { a : i, b : j, c : 1000 * i + j, _id : 1000 * i + j } )}; db.foo.insert(x) }
        
        db.foo.ensureIndex( { a : 1, b : 1 }, { name : "first" } )
        db.foo.ensureIndex( { b : 1 }, { name : "second" } )
        sh.enableSharding("test")
        sh.shardCollection("test.foo", { b : 1 } )
        
      • Wait a minute or two, and then run sh.status().

      Expected result: Data is in several chunks, and the load is balanced with no errors.

      Actual result: Migration failures as the load balancer starts: "Failed with error 'chunk too big to move', from shard01 to shard03" and "Failed with error 'chunk too big to move', from shard01 to shard02", though the chunks seem to eventually get where they need to go.

      Here is the output of sh.status() partway through:

      mongos> sh.status()
      --- Sharding Status ---
        sharding version: {
      	"_id" : 1,
      	"minCompatibleVersion" : 5,
      	"currentVersion" : 6,
      	"clusterId" : ObjectId("5492159f53be077898567039")
      }
        shards:
      	{  "_id" : "shard01",  "host" : "shard01/cross-mb-air.local:27018,cross-mb-air.local:27019,cross-mb-air.local:27020" }
      	{  "_id" : "shard02",  "host" : "shard02/cross-mb-air.local:27021,cross-mb-air.local:27022,cross-mb-air.local:27023" }
      	{  "_id" : "shard03",  "host" : "shard03/cross-mb-air.local:27024,cross-mb-air.local:27025,cross-mb-air.local:27026" }
        balancer:
      	Currently enabled:  yes
      	Currently running:  yes
      		Balancer lock taken at undefined by undefined
      	Collections with active migrations:
      		test.foo started at Wed Dec 17 2014 19:05:28 GMT-0500 (EST)
      	Failed balancer rounds in last 5 attempts:  0
      	Migration Results for the last 24 hours:
      		3 : Success
      		2 : Failed with error 'chunk too big to move', from shard01 to shard03
      		1 : Failed with error 'chunk too big to move', from shard01 to shard02
        databases:
      	{  "_id" : "admin",  "partitioned" : false,  "primary" : "config" }
      	{  "_id" : "test",  "partitioned" : true,  "primary" : "shard01" }
      		test.foo
      			shard key: { "b" : 1 }
      			chunks:
      				shard01	4
      				shard02	2
      				shard03	1
      			{ "b" : { "$minKey" : 1 } } -->> { "b" : 1 } on : shard02 Timestamp(2, 0)
      			{ "b" : 1 } -->> { "b" : 150 } on : shard03 Timestamp(3, 0)
      			{ "b" : 150 } -->> { "b" : 300 } on : shard02 Timestamp(4, 0)
      			{ "b" : 300 } -->> { "b" : 450 } on : shard01 Timestamp(4, 2)
      			{ "b" : 450 } -->> { "b" : 600 } on : shard01 Timestamp(4, 3)
      			{ "b" : 600 } -->> { "b" : 899 } on : shard01 Timestamp(1, 2)
      			{ "b" : 899 } -->> { "b" : { "$maxKey" : 1 } } on : shard01 Timestamp(1, 3)
      
      mongos>
      
      Show
      At bash prompt: mlaunch init --sharded 3 --replicaset mongo Note: I'm using mlaunch version 1.1.6 At the mongo shell: for (i=1; i<=1000; i++ ) {x = []; for (j=1; j<=1000; j++) {x.push( { a : i, b : j, c : 1000 * i + j, _id : 1000 * i + j } )}; db.foo.insert(x) } db.foo.ensureIndex( { a : 1, b : 1 }, { name : "first" } ) db.foo.ensureIndex( { b : 1 }, { name : "second" } ) sh.enableSharding( "test" ) sh.shardCollection( "test.foo" , { b : 1 } ) Wait a minute or two, and then run sh.status(). Expected result: Data is in several chunks, and the load is balanced with no errors. Actual result: Migration failures as the load balancer starts: "Failed with error 'chunk too big to move', from shard01 to shard03" and "Failed with error 'chunk too big to move', from shard01 to shard02", though the chunks seem to eventually get where they need to go. Here is the output of sh.status() partway through: mongos> sh.status() --- Sharding Status --- sharding version: { "_id" : 1, "minCompatibleVersion" : 5, "currentVersion" : 6, "clusterId" : ObjectId( "5492159f53be077898567039" ) } shards: { "_id" : "shard01" , "host" : "shard01/cross-mb-air.local:27018,cross-mb-air.local:27019,cross-mb-air.local:27020" } { "_id" : "shard02" , "host" : "shard02/cross-mb-air.local:27021,cross-mb-air.local:27022,cross-mb-air.local:27023" } { "_id" : "shard03" , "host" : "shard03/cross-mb-air.local:27024,cross-mb-air.local:27025,cross-mb-air.local:27026" } balancer: Currently enabled: yes Currently running: yes Balancer lock taken at undefined by undefined Collections with active migrations: test.foo started at Wed Dec 17 2014 19:05:28 GMT-0500 (EST) Failed balancer rounds in last 5 attempts: 0 Migration Results for the last 24 hours: 3 : Success 2 : Failed with error 'chunk too big to move' , from shard01 to shard03 1 : Failed with error 'chunk too big to move' , from shard01 to shard02 databases: { "_id" : "admin" , "partitioned" : false , "primary" : "config" } { "_id" : "test" , "partitioned" : true , "primary" : "shard01" } test.foo shard key: { "b" : 1 } chunks: shard01 4 shard02 2 shard03 1 { "b" : { "$minKey" : 1 } } -->> { "b" : 1 } on : shard02 Timestamp(2, 0) { "b" : 1 } -->> { "b" : 150 } on : shard03 Timestamp(3, 0) { "b" : 150 } -->> { "b" : 300 } on : shard02 Timestamp(4, 0) { "b" : 300 } -->> { "b" : 450 } on : shard01 Timestamp(4, 2) { "b" : 450 } -->> { "b" : 600 } on : shard01 Timestamp(4, 3) { "b" : 600 } -->> { "b" : 899 } on : shard01 Timestamp(1, 2) { "b" : 899 } -->> { "b" : { "$maxKey" : 1 } } on : shard01 Timestamp(1, 3) mongos>

      The balancer is getting stuck on its initial balancing load.

        1. config_dump.zip
          9 kB
        2. mongos.log
          37 kB

            Assignee:
            Unassigned Unassigned
            Reporter:
            william.cross William Cross
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: