errorMsg: moveChunk cannot enter critical section before all data is cloned

XMLWordPrintableJSON

    • Type: Bug
    • Resolution: Done
    • Priority: Major - P3
    • None
    • Affects Version/s: 2.8.0-rc3, 2.8.0-rc4
    • Component/s: Sharding
    • None
    • Fully Compatible
    • ALL
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      related to SERVER-16763

      found following entry in server log during longevity test, and eventually lead to server crash

      2015-01-05T17:47:22.602+0000 E SHARDING [conn60] moveChunk cannot enter critical section before all data is cloned, 81584 locs were not transferred but to-shard reported { active: true, ns: "sbtest.sbtest1", from: "rs2/172.31.32.214:27017,ip-172-31-35-229:27017", min: { _id: -7816322693657637576 }, max: { _id: -7672769179660119751 }, shardKeyPattern: { _id: "hashed" }, state: "clone", counts: { cloned: 1480, clonedBytes: 321160, catchup: 0, steady: 0 }, ok: 1.0 }
      

      SERVER-16763 addressed issue related to system clock drifting may cause lock timeout issue.

      For the moveChunk message, this could be a separate issue to be fixed.

      I looked up this message "moveChunk cannot enter critical section before all data is cloned, 81584 locs were not transferred but to-shard reported ", which is the last message before the thread's long wait and eventually crash, it point to here https://github.com/mongodb/mongo/blob/master/src/mongo/s/d_migrate.cpp#L1372-L1380 the comment there says:

      // Should never happen, but safe to abort before critical section
      

      mongod then crashes after a while when wait for https://github.com/mongodb/mongo/blob/master/src/mongo/s/d_migrate.cpp#L307 (which shall be fixed by SERVER-16763)

      Not sure what condition could trigger migrateFromStatus.cloneLocsRemaining() not 0 here since we think this condition shall not happen?

            Assignee:
            Randolph Tan
            Reporter:
            Rui Zhang (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

              Created:
              Updated:
              Resolved: