-
Type: Bug
-
Resolution: Duplicate
-
Priority: Major - P3
-
None
-
Affects Version/s: 2.2.4
-
Component/s: Sharding
-
None
-
Environment:Linux 2.6
-
ALL
Mongos crash when we are moveChunk. Caused by the following log:
Tue Nov 12 20:01:12 [conn611979] Assertion: 10429:setShardVersion failed host: 10.38.171.25:7111
When I use gdb debug the coredump file, something must be wrong. The details are:
1. chunk X move from shardA to shardB, chunkA's version update to 29363|0, and shardB's local version ChunkY update to 29363|1. And these updates will send to config server when moveChunk is finished.
2. When use gdb, the ChunkManager object of this ns is displayed as:
_version = 29363|1
_shardVersion[shardA] = 29363|1
_shardVersion[shardB] = 29362|0
3. So, when CheckShardVersion for shardB, since _version is the newest, the retry of 'conf->getChunkManager( ns , true )' will skip reload process.
The key reason is:
_version updated to the newest, but _shardVersion[shardB] is the old version.
I doubt :
When we update ChunkManage to version 29363|1, calculateConfigDiff read chunks info from config server , But first read the old version of ChunkX 29362|0, then the updates of step 1 happens, then read the new version of ChunkY 29363|1.
How to resolve:
when retry 3 times of 'conf->getChunkManager( ns , true )' in CheckShardVersion, use forceload for getChunkManager.
Hope your replies. Thank you!
- is duplicated by
-
SERVER-13089 setShardVersion failed host
- Closed
- related to
-
SERVER-13089 setShardVersion failed host
- Closed