[SERVER-4568] One shard down at run time then insertion on both shard stop by mongos Created: 28/Dec/11 Updated: 11/Jul/16 Resolved: 10/Feb/12 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | 2.0.2 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Critical - P2 |
| Reporter: | jitendra | Assignee: | Scott Hernandez (Inactive) |
| Resolution: | Done | Votes: | 0 |
| Labels: | sharding | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Linux debian |
||
| Operating System: | Linux |
| Participants: |
| Description |
|
mongo server setup are following Shard server: there are two shard server , we generate 1 to 10. shard keys ,1 to 5 shard keys will go shard1 and 6 to 10 shard keys will go shard2. shard1: ./mongod --bind_ip 192.168.50.75 --port 20000 --dbpath /sata1/master --shardsvr --quiet --logpath /usr/local/ct/depend/mongo/logs/mongod_20000.log --logappend --journalCommitInterval 2 mongos 1 now connect to mongos mongos> db.Database.insert({_sk:2}) mongos> db.Database.insert({_sk:7}) i got exception on for _sk:7 , its ok but got exception for _sk:2 which not ok because shard key 2 will go shard1 which is runnig. Regars |
| Comments |
| Comment by Scott Hernandez (Inactive) [ 10/Feb/12 ] |
|
Please reopen if this happens again with the new builds. |
| Comment by Scott Hernandez (Inactive) [ 30/Dec/11 ] |
|
The unit tests run against all versions of mongodb built; I'd have to look at when the test was created but probably before 1.8.0 It is possible the getLastError message is something that is fixed in 2.1.X so you may want to test sa nightly dev build to check. |
| Comment by Scott Hernandez (Inactive) [ 30/Dec/11 ] |
|
The unit tests run against all versions of mongodb built; I'd have to look at when the test was created but probably before 1.8.0 It is possible the getLastError message is something that is fixed in 2.1.X so you may want to test sa nightly dev build to check. |
| Comment by jitendra [ 30/Dec/11 ] |
|
can u tell me on which version u tested. |
| Comment by jitendra [ 30/Dec/11 ] |
|
MongoDB with 2 shards. When both the shards are up and running, our mongo driver using MongoS |
| Comment by jitendra [ 30/Dec/11 ] |
|
can u reply my previous comment. |
| Comment by Scott Hernandez (Inactive) [ 30/Dec/11 ] |
|
So far everything you have posted seems to be consistent with being able to write data to the active shards but getting errors on the ones which are down. This is the expected behavior. We have tests which verify this and we cannot reproduce any problems. Please provide more information if you feel something is wrong... |
| Comment by jitendra [ 30/Dec/11 ] |
|
I want to ask one thing, if one shard down , insert into other on shard and call getLastError() then it give err "socket exception". |
| Comment by jitendra [ 29/Dec/11 ] |
|
hi, pls reply. |
| Comment by jitendra [ 29/Dec/11 ] |
|
mongos logs when you try to insert with _sk:2.Now if can be more clear Thu Dec 29 14:37:42 [conn2] Request::process ns: 00291211.Database msg id:131 attempt: 0 Thu Dec 29 14:37:42 [conn2] Request::process ns: 00291211.$cmd msg id:132 attempt: 0 ntoreturn: -1 options : 0 ntoreturn: 1 options : 0 |
| Comment by jitendra [ 29/Dec/11 ] |
|
mongos logs when you try to insert with _sk:2.Now if can be more clear Thu Dec 29 14:37:42 [conn2] Request::process ns: 00291211.Database msg id:131 attempt: 0 Thu Dec 29 14:37:42 [conn2] Request::process ns: 00291211.$cmd msg id:132 attempt: 0 ntoreturn: -1 options : 0 ntoreturn: 1 options : 0 |
| Comment by jitendra [ 29/Dec/11 ] |
|
one shard is down and other shard is on. Insert on other shard give error. DBClientBase::findN: transport error: 192.168.50.171:25000 query: { setShardVersion: "00291211.Database", configdb: "192.168.50.171:30000", version: Timestamp 6000|0, serverID: ObjectId('4ef24915416fb0a9b89716f8'), shard: "shard0001", shardHost: "192.168.50.171:25000" } |
| Comment by Eliot Horowitz (Inactive) [ 29/Dec/11 ] |
|
I don't understand what you mean. All shards are healthy? |
| Comment by jitendra [ 29/Dec/11 ] |
|
shard is on,can u verify in ur hand. |
| Comment by Eliot Horowitz (Inactive) [ 29/Dec/11 ] |
|
It means it tried to a read or a write but failed because of a socket error. |
| Comment by jitendra [ 29/Dec/11 ] |
|
"uncaught exception: error { "$err" : "socket exception", "code" : 11002 }" what does it means. |
| Comment by jitendra [ 29/Dec/11 ] |
|
"uncaught exception: error { "$err" : "socket exception", "code" : 11002 }" what does it means. |
| Comment by Eliot Horowitz (Inactive) [ 29/Dec/11 ] |
|
If one shard is down - the writes to that shard will fail. |
| Comment by jitendra [ 29/Dec/11 ] |
|
hi |
| Comment by Eliot Horowitz (Inactive) [ 28/Dec/11 ] |
|
If one shard is down - then any query that needs that shard will fail unless you have the partial flag set. |
| Comment by jitendra [ 28/Dec/11 ] |
|
can u tell me if at run time one shard one down and other is running |
| Comment by jitendra [ 28/Dec/11 ] |
|
one more thing i want to add if shard0001 is already down and connect to mongos client then isertion on _sk ;2 work |
| Comment by Scott Hernandez (Inactive) [ 28/Dec/11 ] |
|
Can you increase the logging level to see if you get more information about the issue on the client connection? |
| Comment by jitendra [ 28/Dec/11 ] |
|
log come like below WriteBackListener-192.168.50.75:25000] WriteBackListener exception : socket exception |
| Comment by Scott Hernandez (Inactive) [ 28/Dec/11 ] |
|
Can you please post the mongos logs when you try to insert with _sk:2 That explain doesn't look like it is being run against a sharded collection; there is no shards node and it is using a BasicCursor which does not exist on mongos. |
| Comment by jitendra [ 28/Dec/11 ] |
|
hi pls reply i am waitung for ur reply |
| Comment by jitendra [ 28/Dec/11 ] |
|
hi , |
| Comment by jitendra [ 28/Dec/11 ] |
|
shard0000 is runnin but i try to insert for sk:2 ,this give error uncaught exception: error { "$err" : "socket exception", "code" : 11002 }is this correct. db.Database.find({_sk:2}).explain() } |
| Comment by Scott Hernandez (Inactive) [ 28/Dec/11 ] |
|
Okay, maybe I'm misunderstanding but the explain and chunks shows that shard0001 (port 25000) is being used for _sk:7; the error also indicates this when you try to insert that is the shard used, and you state that you are taking down shard2 (which is on port 25000) so it is completely correct to get that error. Am I missing something here? Maybe we should take a look at _sk:2 to see if it is different. |
| Comment by jitendra [ 28/Dec/11 ] |
|
db.Database.stats() , , , db.shards.find() { "_id" : "shard0000", "host" : "192.168.50.75:20000" } { "_id" : "shard0001", "host" : "192.168.50.75:25000" } |
| Comment by Scott Hernandez (Inactive) [ 28/Dec/11 ] |
|
Which is shard2? Is that shard0000? Can you also post the stats for that collection? db.Database.stats() |
| Comment by jitendra [ 28/Dec/11 ] |
|
after down shard2 db.Database.find({_sk:7}).explain() |
| Comment by jitendra [ 28/Dec/11 ] |
|
db.Database.find({_sk:7}).explain() } |
| Comment by Scott Hernandez (Inactive) [ 28/Dec/11 ] |
|
Please attach the explain output. |
| Comment by jitendra [ 28/Dec/11 ] |
|
Dec 28 2011 03:45:58 PM it is system local time when error came. database was correct. |
| Comment by Scott Hernandez (Inactive) [ 28/Dec/11 ] |
|
It looks like you may be testing on the wrong database. Please return the explain I mentioned in the previous comment. |
| Comment by jitendra [ 28/Dec/11 ] |
|
I send u sharding status where _sk is shard key. chunks contain _sk (1 to 5) move shard0000 and _sk( 6 to 10) move shard0001 and autobalancing of chunks is off. { "_id" : "00281211", "partitioned" : true, "primary" : "shard0000" } 00281211.Database chunks: } -->> { "_sk" : 1 }on : shard0000 { "t" : 6000, "i" : 1 }{ "_sk" : 1 } -->> { "_sk" : 2 }on : shard0000 { "t" : 1000, "i" : 3 }{ "_sk" : 2 } -->> { "_sk" : 3 }on : shard0000 { "t" : 1000, "i" : 5 }{ "_sk" : 3 } -->> { "_sk" : 4 }on : shard0000 { "t" : 1000, "i" : 7 }{ "_sk" : 4 } -->> { "_sk" : 5 }on : shard0000 { "t" : 1000, "i" : 9 }{ "_sk" : 5 } -->> { "_sk" : 6 }on : shard0000 { "t" : 1000, "i" : 11 }{ "_sk" : 6 } -->> { "_sk" : 7 }on : shard0001 { "t" : 2000, "i" : 0 }{ "_sk" : 7 } -->> { "_sk" : 8 }on : shard0001 { "t" : 3000, "i" : 0 }{ "_sk" : 8 } -->> { "_sk" : 9 }on : shard0001 { "t" : 4000, "i" : 0 }{ "_sk" : 9 } -->> { "_sk" : 10 }on : shard0001 { "t" : 5000, "i" : 0 }{ "_sk" : 10 } -->> { "_sk" : { $maxKey : 1 }} on : shard0001 { "t" : 6000, "i" : 0 } { "_id" : "0027711211", "partitioned" : false, "primary" : "shard0000" } { "_id" : "00301211", "partitioned" : false, "primary" : "shard0000" } { "_id" : "test", "partitioned" : false, "primary" : "shard0000" } |
| Comment by Scott Hernandez (Inactive) [ 28/Dec/11 ] |
|
Please attach the chunk information (best to take a dump of the config db). You could also provide an explain [find({_sk:7}).explain()] for those _sk values. |