[SERVER-7297] Queries against sharded collection return "all servers down" Created: 08/Oct/12 Updated: 15/Feb/13 Resolved: 02/Jan/13 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | 2.2.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Stan McQueen | Assignee: | Spencer Brody (Inactive) |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Ubuntu Linux 12.04. All servers are at the 2.2.0 version. |
||
| Operating System: | ALL |
| Participants: |
| Description |
|
Two replication sets, each with 3 servers. Three mongod config servers. Two mongos application servers. I created a sharded configuration with the two replication sets and sharded some collections. After bouncing all the servers, queries in the shell against non-sharded collections succeed, but queries against sharded collections return: MongoDB shell version: 2.2.0 ", |
| Comments |
| Comment by xiaxiang lin [ 24/Dec/12 ] |
|
I fixed it follow your hints. Thanks a lot! |
| Comment by Stan McQueen [ 23/Dec/12 ] |
|
My issue turned out to be a missing host file entry. Each server was accessible from every other server, but not under the host names that the replication set was created with. Check your mongo logs carefully to see if there is a "failure to connect" message in one or more of them. That is what eventually led to my finding the resolution of the problem. |
| Comment by xiaxiang lin [ 23/Dec/12 ] |
|
I am encounting such a problem, too. My mongo cluster's setting are almost the same as Stan's. The connections between mongos, config servers and replica sets have been checked, and there seems to have no problem. |
| Comment by Stan McQueen [ 01/Nov/12 ] |
|
While I was preparing the logs to include, I did notice a "failure to connect" message. Upon looking into it, I found that I had omitted a host entry for the mongod servers. Interestingly, addShard and enableSharding worked and did not cause a problem with queries; but when shardCollection was executed, the command succeeded and sh.status seemed to indicate that all was well even though queries would fail. Now that I have added the missing host entries, all is working as expected. Thanks for your help and I apologize for entering a bug that was actually all my own fault. |
| Comment by Spencer Brody (Inactive) [ 19/Oct/12 ] |
|
Can you include a larger section of the log, with at least 10 minutes to either side of the error? This looks a lot like a problem with the connection between the shard mongods and the config servers. Can you double-check that it is possible to connect to the config servers via the mongo shell from the shard mongod servers? |
| Comment by Stan McQueen [ 19/Oct/12 ] |
|
The problem still exists if I enable sharding on a collection. Once I do that, no further queries on that collection succeed. The only remedy is to delete the config server databases and the local databases on the replica set members, restart everything, re-establish the replica set, and create the shard set again. As long as I don't actually enable sharding for a collection, the mongoses, mongods, and mongodbs respond normally. All servers are reachable from all other servers. Currently I am running with a single shard of one replica set consisting of three servers, three config dbs, and two mongoses. Since restarting everything, I have created the shard, adding the replica set to it, but have not enabled sharding for any of the databases or collections. In that mode, everything works fine. However, this is a load-testing cluster, and I would like to be able to test the effects of sharding on the ability to handle load (before our production servers actually get to the point of having high loads). Our production servers are running MongoDB version 2.0.2 with a single shard of a replica set of three servers. Sharding is enabled on several databases/collections and these servers do not exhibit these symptoms. My first assumption was that I had somehow screwed up the configuration, since I couldn't believe a major sharding bug could have gotten out into the wild, but my load-test config is virtually identical to the production config. |
| Comment by Spencer Brody (Inactive) [ 19/Oct/12 ] |
|
Hi Stan, |
| Comment by Stan McQueen [ 09/Oct/12 ] |
|
By the way, the same symptoms occur with a sharded environment consisting of a single replication set. |
| Comment by Stan McQueen [ 09/Oct/12 ] |
|
Here is the query and response: ", shards: { "_id" : "cwset", "host" : "cwset/cw-mongodb1-test:27017,cw-mongodb2-test:27017,cw-mongodb3-test:27017" } { "_id" : "cwset-2", "host" : "cwset-2/cw-mongodb1-2-test:27017,cw-mongodb2-2-test:27017,cw-mongodb3-2-test:27017" }databases: { "_id" : "admin", "partitioned" : false, "primary" : "config" } { "_id" : "licensedb", "partitioned" : true, "primary" : "cwset" } licensedb.licenses chunks: } -->> { "u" : { $maxKey : 1 } } on : cwset Timestamp(1000, 0) } -->> { "_machineHash" : { $maxKey : 1 }} on : cwset Timestamp(1000, 0) { "_id" : "oauthdb", "partitioned" : false, "primary" : "cwset" } { "_id" : "imagedb", "partitioned" : false, "primary" : "cwset" } { "_id" : "certdb", "partitioned" : false, "primary" : "cwset" } { "_id" : "policydb", "partitioned" : true, "primary" : "cwset" } policydb.policies chunks: } -->> { "u" : { $maxKey : 1 }} on : cwset Timestamp(1000, 0) { "_id" : "settingsdb", "partitioned" : false, "primary" : "cwset" } { "_id" : "statistics", "partitioned" : false, "primary" : "cwset" } { "_id" : "licenseb", "partitioned" : false, "primary" : "cwset-2" }======================================================================= , retryNext: false, init: false, finish: false, errored: false } :: caused by :: 10429 setShardVersion failed host: cw-mongodb1-test:27017 { oldVersion: Timestamp 0|0, oldVersionEpoch: ObjectId('000000000000000000000000'), errmsg: "exception: all servers down!", code: 8002, ok: 0.0 }Tue Oct 9 21:46:00 [conn1879] AssertionException while processing op type : 2004 to : licensedb.licenses :: caused by :: 10429 setShardVersion failed host: cw-mongodb1-test:27017 { oldVersion: Timestamp 0|0, oldVersionEpoch: ObjectId('000000000000000000000000'), errmsg: "exception: all servers down!", code: 8002, ok: 0.0 }Tue Oct 9 21:46:00 [conn1879] Request::process begin ns: admin.$cmd msg id: 59 op: 2004 attempt: 0 ntoreturn: -1 options : 0 |
| Comment by Spencer Brody (Inactive) [ 09/Oct/12 ] |
|
Can you attach the output of running sh.status() in the shell? Can you also attach logs from the mongos from when you try to run a query on a sharded collection and see this error message instead? |