[SERVER-3296] mongos still attempting to setShardVersion on slave MongDB Created: 20/Jun/11 Updated: 12/Jul/16 Resolved: 06/Jul/11 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | 1.8.2 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Joachim | Assignee: | Greg Studer |
| Resolution: | Done | Votes: | 1 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Ubuntu |
||
| Attachments: |
|
| Operating System: | ALL |
| Participants: |
| Description |
|
Despite upgrading to MongoDB 1.8.2, we're still seeing mongos attempt setShardVersion on slave MongDB instances (like those described in Here's what we see in the mongos logs: Sun Jun 19 17:38:33 [conn627] Assertion: 10429:setShardVersion failed host[mongo-c01r03s03:27018] { errmsg: "not master", ok: 0.0 }0x5204fa 0x6a15ed 0x6a1152 Here's the output of --version for that mongos instance: All mongod instances in the cluster (including config server instances) are running 1.8.2. |
| Comments |
| Comment by Greg Studer [ 06/Jul/11 ] |
|
thanks for the update - warning was a double-check added to the newer version since this was a backport, will only trigger once in non-verbose mode - as it says, it's safe, but we want to know about it. Basically means that you're performing a sharded operation on a non-sharded connection, which is done for getLastError(). In newer versions we'll want to migrate those operations that we can away from this, but the underlying issue you were having checking the version of these non-shard connections should now be fixed. |
| Comment by Eliot Horowitz (Inactive) [ 04/Jul/11 ] |
|
You only need to do mongos. |
| Comment by Eliot Horowitz (Inactive) [ 04/Jul/11 ] |
|
The patch is now in the 1.8 nightly. |
| Comment by Luc Suryo [ 04/Jul/11 ] |
|
Any update or any patch? the issue is effecting out side pretty badly... |
| Comment by Greg Studer [ 23/Jun/11 ] |
|
I don't think there's a manual workaround aside from stepping down again to the original host or bouncing mongos - each mongos has a collection of hosts which sticks around for the life of the instance. Reconfiguring your shard to remove the host from the RSet URL may work temporarily, but on a second failover the same could happen to the remaining hosts. |
| Comment by Joachim [ 23/Jun/11 ] |
|
Are there any steps we can take to fix this manually? |
| Comment by Greg Studer [ 22/Jun/11 ] |
|
patch is in 1.9 now, reviewing for potential backport |
| Comment by Greg Studer [ 21/Jun/11 ] |
|
Thanks for the verbose logs, we see what we believe to be the problem, and are working on a patch. |
| Comment by Greg Studer [ 20/Jun/11 ] |
|
*verbose = true (not sure if verbose = yes works) |
| Comment by Greg Studer [ 20/Jun/11 ] |
|
Yes, it will be fine to make the ticket private. That configuration is good, it will show any ReplicaSetMonitor messages. |
| Comment by Greg Studer [ 20/Jun/11 ] |
|
Something strange seems to occur with replica set monitoring... nothing ever gets updated. Do you have gdb installed on any of these machines? If so, is it possible to get a stack trace of the running threads a few minutes after failure starts happening? If not, can you up the log verbosity for a mongos run, and wait again for the errors? |
| Comment by Luc Suryo [ 20/Jun/11 ] |
|
10gen team I will take over from Joachim, so please ask me anything you need |
| Comment by Joachim [ 20/Jun/11 ] |
|
I've attached a gzipped copy of the mongos log. |
| Comment by Joachim [ 20/Jun/11 ] |
|
No, I don't think this was right after a primary/secondary switch. This has been happening continuously, with errors every few seconds on each server running mongos, both previously with 1.8.1 and currently with 1.8.2. Example of error count samples taken every 2 seconds: Output of connPoolStats: http://pastebin.com/Q0mGFgRw |
| Comment by Eliot Horowitz (Inactive) [ 20/Jun/11 ] |
|
Also, was this right after a primary/secondary switch? |
| Comment by Eliot Horowitz (Inactive) [ 20/Jun/11 ] |
|
can you:
|