Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-9646

MongoS checkVersion fails if any shard is down

    • Type: Icon: Bug Bug
    • Resolution: Done
    • Priority: Icon: Major - P3 Major - P3
    • 2.5.5
    • Affects Version/s: 2.4.3
    • Component/s: Sharding
    • None
    • Environment:
      3x Single MongoD shards
      1x Mongos
    • ALL
    • Hide
      1. Create sharded cluster
      2. Insert data, query as normal.
      3. Kill one shard
      4. Close shell connection
      5. Start new Shell connection
      6. Run a query which will not hit the down shard - this will fail
      7. Run a query which will not hit the down shard - this will succeed

      Example:

      sh.enableSharding("test");
      sh.shardCollection("test.test", {_id:1});
      db.getSiblingDB("test").test.insert({_id:1})
      db.getSiblingDB("test").test.insert({_id:2})
      db.getSiblingDB("test").test.insert({_id:3})
      sh.splitAt("test.test", {_id:2});
      sh.splitAt("test.test", {_id:3});
      
      MongoDB 2.4.3> sh.status();
      --- Sharding Status ---
        sharding version: {
      	"_id" : 1,
      	"version" : 3,
      	"minCompatibleVersion" : 3,
      	"currentVersion" : 4,
      	"clusterId" : ObjectId("518c69e46e078ae26603eff2")
      }
        shards:
      	{  "_id" : "shard0000",  "host" : "pixl:28000" }
      	{  "_id" : "shard0001",  "host" : "pixl:28002" }
      	{  "_id" : "shard0002",  "host" : "pixl:28001" }
        databases:
      	{  "_id" : "admin",  "partitioned" : false,  "primary" : "config" }
      	{  "_id" : "test",  "partitioned" : true,  "primary" : "shard0000" }
      		test.test
      			shard key: { "_id" : 1 }
      			chunks:
      				shard0001	1
      				shard0002	1
      				shard0000	1
      			{ "_id" : { "$minKey" : 1 } } -->> { "_id" : 2 } on : shard0001 { "t" : 2, "i" : 0 }
      			{ "_id" : 2 } -->> { "_id" : 3 } on : shard0002 { "t" : 3, "i" : 0 }
      			{ "_id" : 3 } -->> { "_id" : { "$maxKey" : 1 } } on : shard0000 { "t" : 3, "i" : 1 }
      

      Query (assuming shard0000 is down)

      db.getSiblingDB("test").test.find({_id:2})
      

      Example Output

      [1] 14:31:28 PRIMARY:test@Pixl.local:28100>
      MongoDB 2.4.3> db.getSiblingDB("test").test.find({_id:2})
      error: {
      	"$err" : "socket exception [CONNECT_ERROR] for pixl:28000",
      	"code" : 11002,
      	"shard" : "shard0002"
      }
      
      [2] 14:31:40 PRIMARY:test@Pixl.local:28100>
      MongoDB 2.4.3> db.getSiblingDB("test").test.find({_id:2})
      { "_id" : 2 }
      
      Show
      Create sharded cluster Insert data, query as normal. Kill one shard Close shell connection Start new Shell connection Run a query which will not hit the down shard - this will fail Run a query which will not hit the down shard - this will succeed Example: sh.enableSharding( "test" ); sh.shardCollection( "test.test" , {_id:1}); db.getSiblingDB( "test" ).test.insert({_id:1}) db.getSiblingDB( "test" ).test.insert({_id:2}) db.getSiblingDB( "test" ).test.insert({_id:3}) sh.splitAt( "test.test" , {_id:2}); sh.splitAt( "test.test" , {_id:3}); MongoDB 2.4.3> sh.status(); --- Sharding Status --- sharding version: { "_id" : 1, "version" : 3, "minCompatibleVersion" : 3, "currentVersion" : 4, "clusterId" : ObjectId( "518c69e46e078ae26603eff2" ) } shards: { "_id" : "shard0000" , "host" : "pixl:28000" } { "_id" : "shard0001" , "host" : "pixl:28002" } { "_id" : "shard0002" , "host" : "pixl:28001" } databases: { "_id" : "admin" , "partitioned" : false , "primary" : "config" } { "_id" : "test" , "partitioned" : true , "primary" : "shard0000" } test.test shard key: { "_id" : 1 } chunks: shard0001 1 shard0002 1 shard0000 1 { "_id" : { "$minKey" : 1 } } -->> { "_id" : 2 } on : shard0001 { "t" : 2, "i" : 0 } { "_id" : 2 } -->> { "_id" : 3 } on : shard0002 { "t" : 3, "i" : 0 } { "_id" : 3 } -->> { "_id" : { "$maxKey" : 1 } } on : shard0000 { "t" : 3, "i" : 1 } Query (assuming shard0000 is down) db.getSiblingDB( "test" ).test.find({_id:2}) Example Output [1] 14:31:28 PRIMARY:test@Pixl.local:28100> MongoDB 2.4.3> db.getSiblingDB( "test" ).test.find({_id:2}) error: { "$err" : "socket exception [CONNECT_ERROR] for pixl:28000" , "code" : 11002, "shard" : "shard0002" } [2] 14:31:40 PRIMARY:test@Pixl.local:28100> MongoDB 2.4.3> db.getSiblingDB( "test" ).test.find({_id:2}) { "_id" : 2 }

      Currently when you create a new connection on a MongoS it spawns its connections too a MongoD. When you issue a first query too the cluster this will do an initial checkVersion. This will fail if any one of the shards is down. All subsequent queries will work.

      Here is an excerpt from a MongoS when this error occurs;

      Fri May 10 13:59:24.321 [Balancer] caught exception while doing balance: socket exception [CONNECT_ERROR] for pixl:28000
      Fri May 10 13:59:26.522 [WriteBackListener-pixl:28000] WriteBackListener exception : socket exception [CONNECT_ERROR] for pixl:28000
      Fri May 10 13:59:26.800 [conn5] warning: problem while initially checking shard versions on shard0000 :: caused by :: socket exception [CONNECT_ERROR] for pixl:28000
      Fri May 10 13:59:26.800 [conn5] warning: socket exception when initializing on shard0002:pixl:28001, current connection state is { state: { conn: "", vinfo: "MongoPingDB.MongoPingDB @ 3|1||518c6a8b6e078ae26603f011", cursor: "(none)", count: 0, done: false }, retryNext: false, init: false, finish: false, errored: false } :: caused by :: 11002 socket exception [6] server [pixl:28000] mongos shardconnection connectionpool error: couldn't connect to server pixl:28000
      Fri May 10 13:59:26.800 [conn5] warning: socket exception when initializing on shard0002:pixl:28001, current connection state is { state: { conn: "", vinfo: "MongoPingDB.MongoPingDB @ 3|1||518c6a8b6e078ae26603f011", cursor: "(none)", count: 0, done: false }, retryNext: false, init: false, finish: false, errored: false } :: caused by :: 11002 socket exception [6] server [pixl:28000] mongos shardconnection connectionpool error: couldn't connect to server pixl:28000
      

            Assignee:
            Unassigned Unassigned
            Reporter:
            david.hows David Hows
            Votes:
            2 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: