[SERVER-3683] Possible for setShardVersion to never be set on mongod after multiple StaleConfigExceptions due to stale/missing mongod metadata Created: 24/Aug/11 Updated: 11/Jul/16 Resolved: 16/Sep/11 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | 1.8.3 |
| Fix Version/s: | 1.8.4, 2.0.0-rc2 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Greg Studer | Assignee: | Eliot Horowitz (Inactive) |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||
| Operating System: | ALL | ||||||||||||||||
| Participants: | |||||||||||||||||
| Description |
|
...possibly also affects 1.9/2.0. Core issue is that on a StaleConfigException from a query (handled in the mongos at s/request.cpp), the steps to update the cached shard information from the config server in 1.8.3 no longer always reload the ChunkManager for collections that have not changed. It seems like the assumption is that the mongod is more up-to-date than the shard, and so we should not need to to call setShardVersion on the mongod unless the mongos config information (ChunkManager) changes (it always changes on reload in 1.8.2). If somehow the mongod sharding metadata is less up-to-date than the mongos, the query will be retried repeatedly until it fails. Fix may be to reload the chunk manager after the second retry, in order to handle this case. Not sure at the moment how this state could come about. |
| Comments |
| Comment by Greg Studer [ 12/Dec/11 ] |
|
see |
| Comment by Kiril Savino [ 23/Nov/11 ] |
|
Also seeing a ton of this in the logs, in 2.0.1, after a relatively recent failover, on the primary node. |
| Comment by Zeph Wang [ 01/Nov/11 ] |
|
I'm seeing a lot of these message in 2.0.1 mongod/mongos logs. Are they related to this issue? |
| Comment by auto [ 06/Sep/11 ] |
|
Author: {u'login': u'erh', u'name': u'Eliot Horowitz', u'email': u'eliot@10gen.com'}Message: when running checkShardVersion, need to make sure we do on actual connection, not replica set connection Conflicts: s/shard_version.cpp |
| Comment by auto [ 04/Sep/11 ] |
|
Author: {u'login': u'erh', u'name': u'Eliot Horowitz', u'email': u'eliot@10gen.com'}Message: test for |
| Comment by auto [ 04/Sep/11 ] |
|
Author: {u'login': u'erh', u'name': u'Eliot Horowitz', u'email': u'eliot@10gen.com'}Message: when running checkShardVersion, need to make sure we do on actual connection, not replica set connection |