Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-9646

MongoS checkVersion fails if any shard is down

    XMLWordPrintableJSON

Details

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Major - P3 Major - P3
    • 2.5.5
    • 2.4.3
    • Sharding
    • None
    • 3x Single MongoD shards
      1x Mongos
    • ALL
    • Hide
      1. Create sharded cluster
      2. Insert data, query as normal.
      3. Kill one shard
      4. Close shell connection
      5. Start new Shell connection
      6. Run a query which will not hit the down shard - this will fail
      7. Run a query which will not hit the down shard - this will succeed

      Example:

      sh.enableSharding("test");
      sh.shardCollection("test.test", {_id:1});
      db.getSiblingDB("test").test.insert({_id:1})
      db.getSiblingDB("test").test.insert({_id:2})
      db.getSiblingDB("test").test.insert({_id:3})
      sh.splitAt("test.test", {_id:2});
      sh.splitAt("test.test", {_id:3});
       
      MongoDB 2.4.3> sh.status();
      --- Sharding Status ---
        sharding version: {
      	"_id" : 1,
      	"version" : 3,
      	"minCompatibleVersion" : 3,
      	"currentVersion" : 4,
      	"clusterId" : ObjectId("518c69e46e078ae26603eff2")
      }
        shards:
      	{  "_id" : "shard0000",  "host" : "pixl:28000" }
      	{  "_id" : "shard0001",  "host" : "pixl:28002" }
      	{  "_id" : "shard0002",  "host" : "pixl:28001" }
        databases:
      	{  "_id" : "admin",  "partitioned" : false,  "primary" : "config" }
      	{  "_id" : "test",  "partitioned" : true,  "primary" : "shard0000" }
      		test.test
      			shard key: { "_id" : 1 }
      			chunks:
      				shard0001	1
      				shard0002	1
      				shard0000	1
      			{ "_id" : { "$minKey" : 1 } } -->> { "_id" : 2 } on : shard0001 { "t" : 2, "i" : 0 }
      			{ "_id" : 2 } -->> { "_id" : 3 } on : shard0002 { "t" : 3, "i" : 0 }
      			{ "_id" : 3 } -->> { "_id" : { "$maxKey" : 1 } } on : shard0000 { "t" : 3, "i" : 1 }

      Query (assuming shard0000 is down)

      db.getSiblingDB("test").test.find({_id:2})

      Example Output

       
      [1] 14:31:28 PRIMARY:test@Pixl.local:28100>
      MongoDB 2.4.3> db.getSiblingDB("test").test.find({_id:2})
      error: {
      	"$err" : "socket exception [CONNECT_ERROR] for pixl:28000",
      	"code" : 11002,
      	"shard" : "shard0002"
      }
       
      [2] 14:31:40 PRIMARY:test@Pixl.local:28100>
      MongoDB 2.4.3> db.getSiblingDB("test").test.find({_id:2})
      { "_id" : 2 }

      Show
      Create sharded cluster Insert data, query as normal. Kill one shard Close shell connection Start new Shell connection Run a query which will not hit the down shard - this will fail Run a query which will not hit the down shard - this will succeed Example: sh.enableSharding("test"); sh.shardCollection("test.test", {_id:1}); db.getSiblingDB("test").test.insert({_id:1}) db.getSiblingDB("test").test.insert({_id:2}) db.getSiblingDB("test").test.insert({_id:3}) sh.splitAt("test.test", {_id:2}); sh.splitAt("test.test", {_id:3});   MongoDB 2.4.3> sh.status(); --- Sharding Status --- sharding version: { "_id" : 1, "version" : 3, "minCompatibleVersion" : 3, "currentVersion" : 4, "clusterId" : ObjectId("518c69e46e078ae26603eff2") } shards: { "_id" : "shard0000", "host" : "pixl:28000" } { "_id" : "shard0001", "host" : "pixl:28002" } { "_id" : "shard0002", "host" : "pixl:28001" } databases: { "_id" : "admin", "partitioned" : false, "primary" : "config" } { "_id" : "test", "partitioned" : true, "primary" : "shard0000" } test.test shard key: { "_id" : 1 } chunks: shard0001 1 shard0002 1 shard0000 1 { "_id" : { "$minKey" : 1 } } -->> { "_id" : 2 } on : shard0001 { "t" : 2, "i" : 0 } { "_id" : 2 } -->> { "_id" : 3 } on : shard0002 { "t" : 3, "i" : 0 } { "_id" : 3 } -->> { "_id" : { "$maxKey" : 1 } } on : shard0000 { "t" : 3, "i" : 1 } Query (assuming shard0000 is down) db.getSiblingDB("test").test.find({_id:2}) Example Output   [1] 14:31:28 PRIMARY:test@Pixl.local:28100> MongoDB 2.4.3> db.getSiblingDB("test").test.find({_id:2}) error: { "$err" : "socket exception [CONNECT_ERROR] for pixl:28000", "code" : 11002, "shard" : "shard0002" }   [2] 14:31:40 PRIMARY:test@Pixl.local:28100> MongoDB 2.4.3> db.getSiblingDB("test").test.find({_id:2}) { "_id" : 2 }

    Description

      Currently when you create a new connection on a MongoS it spawns its connections too a MongoD. When you issue a first query too the cluster this will do an initial checkVersion. This will fail if any one of the shards is down. All subsequent queries will work.

      Here is an excerpt from a MongoS when this error occurs;

      Fri May 10 13:59:24.321 [Balancer] caught exception while doing balance: socket exception [CONNECT_ERROR] for pixl:28000
      Fri May 10 13:59:26.522 [WriteBackListener-pixl:28000] WriteBackListener exception : socket exception [CONNECT_ERROR] for pixl:28000
      Fri May 10 13:59:26.800 [conn5] warning: problem while initially checking shard versions on shard0000 :: caused by :: socket exception [CONNECT_ERROR] for pixl:28000
      Fri May 10 13:59:26.800 [conn5] warning: socket exception when initializing on shard0002:pixl:28001, current connection state is { state: { conn: "", vinfo: "MongoPingDB.MongoPingDB @ 3|1||518c6a8b6e078ae26603f011", cursor: "(none)", count: 0, done: false }, retryNext: false, init: false, finish: false, errored: false } :: caused by :: 11002 socket exception [6] server [pixl:28000] mongos shardconnection connectionpool error: couldn't connect to server pixl:28000
      Fri May 10 13:59:26.800 [conn5] warning: socket exception when initializing on shard0002:pixl:28001, current connection state is { state: { conn: "", vinfo: "MongoPingDB.MongoPingDB @ 3|1||518c6a8b6e078ae26603f011", cursor: "(none)", count: 0, done: false }, retryNext: false, init: false, finish: false, errored: false } :: caused by :: 11002 socket exception [6] server [pixl:28000] mongos shardconnection connectionpool error: couldn't connect to server pixl:28000

      Attachments

        Activity

          People

            Unassigned Unassigned
            david.hows David Hows
            Votes:
            2 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: