Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-11697

Mongos crash when moveChunk

    • Type: Icon: Bug Bug
    • Resolution: Duplicate
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 2.2.4
    • Component/s: Sharding
    • None
    • Environment:
      Linux 2.6
    • ALL

      Mongos crash when we are moveChunk. Caused by the following log:
      Tue Nov 12 20:01:12 [conn611979] Assertion: 10429:setShardVersion failed host: 10.38.171.25:7111

      { oldVersion: Timestamp 29317000|0, oldVersionEpoch: ObjectId('522ad499e1814e603d11be30'), ns: "appid250528.meta_infos0", version: Timestamp 2932 0000|0, versionEpoch: ObjectId('522ad499e1814e603d11be30'), globalVersion: Timestamp 29321000|0, globalVersionEpoch: Obje ctId('522ad499e1814e603d11be30'), reloadConfig: true, errmsg: "shard global version for collection is higher than trying to set to 'appid250528.meta_infos0'", ok: 0.0 }

      When I use gdb debug the coredump file, something must be wrong. The details are:
      1. chunk X move from shardA to shardB, chunkA's version update to 29363|0, and shardB's local version ChunkY update to 29363|1. And these updates will send to config server when moveChunk is finished.
      2. When use gdb, the ChunkManager object of this ns is displayed as:
      _version = 29363|1
      _shardVersion[shardA] = 29363|1
      _shardVersion[shardB] = 29362|0
      3. So, when CheckShardVersion for shardB, since _version is the newest, the retry of 'conf->getChunkManager( ns , true )' will skip reload process.

      The key reason is:
      _version updated to the newest, but _shardVersion[shardB] is the old version.

      I doubt :
      When we update ChunkManage to version 29363|1, calculateConfigDiff read chunks info from config server , But first read the old version of ChunkX 29362|0, then the updates of step 1 happens, then read the new version of ChunkY 29363|1.

      How to resolve:
      when retry 3 times of 'conf->getChunkManager( ns , true )' in CheckShardVersion, use forceload for getChunkManager.

      Hope your replies. Thank you!

        1. core.log
          18 kB
          Jack Chan

            Assignee:
            Unassigned Unassigned
            Reporter:
            hustchensi Jack Chan
            Votes:
            3 Vote for this issue
            Watchers:
            8 Start watching this issue

              Created:
              Updated:
              Resolved: