[SERVER-9646] MongoS checkVersion fails if any shard is down Created: 10/May/13  Updated: 11/Jul/16  Resolved: 07/Mar/14

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 2.4.3
Fix Version/s: 2.5.5

Type: Bug Priority: Major - P3
Reporter: David Hows Assignee: Unassigned
Resolution: Done Votes: 2
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

3x Single MongoD shards
1x Mongos


Issue Links:
Related
related to SERVER-5625 New sharded connections to a namespac... Closed
Operating System: ALL
Steps To Reproduce:
  1. Create sharded cluster
  2. Insert data, query as normal.
  3. Kill one shard
  4. Close shell connection
  5. Start new Shell connection
  6. Run a query which will not hit the down shard - this will fail
  7. Run a query which will not hit the down shard - this will succeed

Example:

sh.enableSharding("test");
sh.shardCollection("test.test", {_id:1});
db.getSiblingDB("test").test.insert({_id:1})
db.getSiblingDB("test").test.insert({_id:2})
db.getSiblingDB("test").test.insert({_id:3})
sh.splitAt("test.test", {_id:2});
sh.splitAt("test.test", {_id:3});
 
MongoDB 2.4.3> sh.status();
--- Sharding Status ---
  sharding version: {
	"_id" : 1,
	"version" : 3,
	"minCompatibleVersion" : 3,
	"currentVersion" : 4,
	"clusterId" : ObjectId("518c69e46e078ae26603eff2")
}
  shards:
	{  "_id" : "shard0000",  "host" : "pixl:28000" }
	{  "_id" : "shard0001",  "host" : "pixl:28002" }
	{  "_id" : "shard0002",  "host" : "pixl:28001" }
  databases:
	{  "_id" : "admin",  "partitioned" : false,  "primary" : "config" }
	{  "_id" : "test",  "partitioned" : true,  "primary" : "shard0000" }
		test.test
			shard key: { "_id" : 1 }
			chunks:
				shard0001	1
				shard0002	1
				shard0000	1
			{ "_id" : { "$minKey" : 1 } } -->> { "_id" : 2 } on : shard0001 { "t" : 2, "i" : 0 }
			{ "_id" : 2 } -->> { "_id" : 3 } on : shard0002 { "t" : 3, "i" : 0 }
			{ "_id" : 3 } -->> { "_id" : { "$maxKey" : 1 } } on : shard0000 { "t" : 3, "i" : 1 }

Query (assuming shard0000 is down)

db.getSiblingDB("test").test.find({_id:2})

Example Output

 
[1] 14:31:28 PRIMARY:test@Pixl.local:28100>
MongoDB 2.4.3> db.getSiblingDB("test").test.find({_id:2})
error: {
	"$err" : "socket exception [CONNECT_ERROR] for pixl:28000",
	"code" : 11002,
	"shard" : "shard0002"
}
 
[2] 14:31:40 PRIMARY:test@Pixl.local:28100>
MongoDB 2.4.3> db.getSiblingDB("test").test.find({_id:2})
{ "_id" : 2 }

Participants:

 Description   

Currently when you create a new connection on a MongoS it spawns its connections too a MongoD. When you issue a first query too the cluster this will do an initial checkVersion. This will fail if any one of the shards is down. All subsequent queries will work.

Here is an excerpt from a MongoS when this error occurs;

Fri May 10 13:59:24.321 [Balancer] caught exception while doing balance: socket exception [CONNECT_ERROR] for pixl:28000
Fri May 10 13:59:26.522 [WriteBackListener-pixl:28000] WriteBackListener exception : socket exception [CONNECT_ERROR] for pixl:28000
Fri May 10 13:59:26.800 [conn5] warning: problem while initially checking shard versions on shard0000 :: caused by :: socket exception [CONNECT_ERROR] for pixl:28000
Fri May 10 13:59:26.800 [conn5] warning: socket exception when initializing on shard0002:pixl:28001, current connection state is { state: { conn: "", vinfo: "MongoPingDB.MongoPingDB @ 3|1||518c6a8b6e078ae26603f011", cursor: "(none)", count: 0, done: false }, retryNext: false, init: false, finish: false, errored: false } :: caused by :: 11002 socket exception [6] server [pixl:28000] mongos shardconnection connectionpool error: couldn't connect to server pixl:28000
Fri May 10 13:59:26.800 [conn5] warning: socket exception when initializing on shard0002:pixl:28001, current connection state is { state: { conn: "", vinfo: "MongoPingDB.MongoPingDB @ 3|1||518c6a8b6e078ae26603f011", cursor: "(none)", count: 0, done: false }, retryNext: false, init: false, finish: false, errored: false } :: caused by :: 11002 socket exception [6] server [pixl:28000] mongos shardconnection connectionpool error: couldn't connect to server pixl:28000



 Comments   
Comment by Greg Studer [ 07/Mar/14 ]

Fixed by SERVER-5625

Comment by Richard Cresswell [ 21/Aug/13 ]

Queries that do not include the shard key and should query across all shards also fail. It would be nice if the query returned the results from the hosts that were still up.

Generated at Thu Feb 08 03:21:02 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.