Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-9646

MongoS checkVersion fails if any shard is down

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Fixed
    • Affects Version/s: 2.4.3
    • Fix Version/s: 2.5.5
    • Component/s: Sharding
    • Labels:
      None
    • Environment:
      3x Single MongoD shards
      1x Mongos
    • Operating System:
      ALL
    • Steps To Reproduce:
      Hide
      1. Create sharded cluster
      2. Insert data, query as normal.
      3. Kill one shard
      4. Close shell connection
      5. Start new Shell connection
      6. Run a query which will not hit the down shard - this will fail
      7. Run a query which will not hit the down shard - this will succeed

      Example:

      sh.enableSharding("test");
      sh.shardCollection("test.test", {_id:1});
      db.getSiblingDB("test").test.insert({_id:1})
      db.getSiblingDB("test").test.insert({_id:2})
      db.getSiblingDB("test").test.insert({_id:3})
      sh.splitAt("test.test", {_id:2});
      sh.splitAt("test.test", {_id:3});
       
      MongoDB 2.4.3> sh.status();
      --- Sharding Status ---
        sharding version: {
      	"_id" : 1,
      	"version" : 3,
      	"minCompatibleVersion" : 3,
      	"currentVersion" : 4,
      	"clusterId" : ObjectId("518c69e46e078ae26603eff2")
      }
        shards:
      	{  "_id" : "shard0000",  "host" : "pixl:28000" }
      	{  "_id" : "shard0001",  "host" : "pixl:28002" }
      	{  "_id" : "shard0002",  "host" : "pixl:28001" }
        databases:
      	{  "_id" : "admin",  "partitioned" : false,  "primary" : "config" }
      	{  "_id" : "test",  "partitioned" : true,  "primary" : "shard0000" }
      		test.test
      			shard key: { "_id" : 1 }
      			chunks:
      				shard0001	1
      				shard0002	1
      				shard0000	1
      			{ "_id" : { "$minKey" : 1 } } -->> { "_id" : 2 } on : shard0001 { "t" : 2, "i" : 0 }
      			{ "_id" : 2 } -->> { "_id" : 3 } on : shard0002 { "t" : 3, "i" : 0 }
      			{ "_id" : 3 } -->> { "_id" : { "$maxKey" : 1 } } on : shard0000 { "t" : 3, "i" : 1 }

      Query (assuming shard0000 is down)

      db.getSiblingDB("test").test.find({_id:2})

      Example Output

       
      [1] 14:31:28 PRIMARY:test@Pixl.local:28100>
      MongoDB 2.4.3> db.getSiblingDB("test").test.find({_id:2})
      error: {
      	"$err" : "socket exception [CONNECT_ERROR] for pixl:28000",
      	"code" : 11002,
      	"shard" : "shard0002"
      }
       
      [2] 14:31:40 PRIMARY:test@Pixl.local:28100>
      MongoDB 2.4.3> db.getSiblingDB("test").test.find({_id:2})
      { "_id" : 2 }

      Show
      Create sharded cluster Insert data, query as normal. Kill one shard Close shell connection Start new Shell connection Run a query which will not hit the down shard - this will fail Run a query which will not hit the down shard - this will succeed Example: sh.enableSharding("test"); sh.shardCollection("test.test", {_id:1}); db.getSiblingDB("test").test.insert({_id:1}) db.getSiblingDB("test").test.insert({_id:2}) db.getSiblingDB("test").test.insert({_id:3}) sh.splitAt("test.test", {_id:2}); sh.splitAt("test.test", {_id:3});   MongoDB 2.4.3> sh.status(); --- Sharding Status --- sharding version: { "_id" : 1, "version" : 3, "minCompatibleVersion" : 3, "currentVersion" : 4, "clusterId" : ObjectId("518c69e46e078ae26603eff2") } shards: { "_id" : "shard0000", "host" : "pixl:28000" } { "_id" : "shard0001", "host" : "pixl:28002" } { "_id" : "shard0002", "host" : "pixl:28001" } databases: { "_id" : "admin", "partitioned" : false, "primary" : "config" } { "_id" : "test", "partitioned" : true, "primary" : "shard0000" } test.test shard key: { "_id" : 1 } chunks: shard0001 1 shard0002 1 shard0000 1 { "_id" : { "$minKey" : 1 } } -->> { "_id" : 2 } on : shard0001 { "t" : 2, "i" : 0 } { "_id" : 2 } -->> { "_id" : 3 } on : shard0002 { "t" : 3, "i" : 0 } { "_id" : 3 } -->> { "_id" : { "$maxKey" : 1 } } on : shard0000 { "t" : 3, "i" : 1 } Query (assuming shard0000 is down) db.getSiblingDB("test").test.find({_id:2}) Example Output   [1] 14:31:28 PRIMARY:test@Pixl.local:28100> MongoDB 2.4.3> db.getSiblingDB("test").test.find({_id:2}) error: { "$err" : "socket exception [CONNECT_ERROR] for pixl:28000", "code" : 11002, "shard" : "shard0002" }   [2] 14:31:40 PRIMARY:test@Pixl.local:28100> MongoDB 2.4.3> db.getSiblingDB("test").test.find({_id:2}) { "_id" : 2 }

      Description

      Currently when you create a new connection on a MongoS it spawns its connections too a MongoD. When you issue a first query too the cluster this will do an initial checkVersion. This will fail if any one of the shards is down. All subsequent queries will work.

      Here is an excerpt from a MongoS when this error occurs;

      Fri May 10 13:59:24.321 [Balancer] caught exception while doing balance: socket exception [CONNECT_ERROR] for pixl:28000
      Fri May 10 13:59:26.522 [WriteBackListener-pixl:28000] WriteBackListener exception : socket exception [CONNECT_ERROR] for pixl:28000
      Fri May 10 13:59:26.800 [conn5] warning: problem while initially checking shard versions on shard0000 :: caused by :: socket exception [CONNECT_ERROR] for pixl:28000
      Fri May 10 13:59:26.800 [conn5] warning: socket exception when initializing on shard0002:pixl:28001, current connection state is { state: { conn: "", vinfo: "MongoPingDB.MongoPingDB @ 3|1||518c6a8b6e078ae26603f011", cursor: "(none)", count: 0, done: false }, retryNext: false, init: false, finish: false, errored: false } :: caused by :: 11002 socket exception [6] server [pixl:28000] mongos shardconnection connectionpool error: couldn't connect to server pixl:28000
      Fri May 10 13:59:26.800 [conn5] warning: socket exception when initializing on shard0002:pixl:28001, current connection state is { state: { conn: "", vinfo: "MongoPingDB.MongoPingDB @ 3|1||518c6a8b6e078ae26603f011", cursor: "(none)", count: 0, done: false }, retryNext: false, init: false, finish: false, errored: false } :: caused by :: 11002 socket exception [6] server [pixl:28000] mongos shardconnection connectionpool error: couldn't connect to server pixl:28000

        Attachments

          Issue Links

            Activity

              People

              • Votes:
                2 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: