Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-12178

cleanupOrphan can fail if shard is moving chunks

    • Type: Icon: Bug Bug
    • Resolution: Done
    • Priority: Icon: Major - P3 Major - P3
    • 2.5.5
    • Affects Version/s: None
    • Component/s: Sharding
    • Labels:
    • ALL

      Note: commit message is wrong. Forgot to change it before pushing.

      The fix for SERVER-11277 (99dff054c8b8) seems to cause moveChunk commands to fail with a transport error under certain circumstances. The attached JS file reproduces the problem.

        m30999| 2013-12-20T14:12:56.315-0500 [WriteBackListener-localhost:30000] DBClientCursor::init call() failed
       m30000| 2013-12-20T14:12:56.315-0500 [conn1] end connection 127.0.0.1:50886 (4 connections now open)
       m30000| 2013-12-20T14:12:56.315-0500 [conn3] end connection 127.0.0.1:50899 (4 connections now open)
       m30000| 2013-12-20T14:12:56.315-0500 [conn5] end connection 127.0.0.1:50910 (3 connections now open)
       m30001| 2013-12-20T14:12:56.315-0500 [conn5] end connection 127.0.0.1:50908 (4 connections now open)
       m30999| 2013-12-20T14:12:56.315-0500 [WriteBackListener-localhost:30000] Detected bad connection created at 1387566773596630 microSec, clearing pool for localhost:30000 of 0 connections
       m30999| 2013-12-20T14:12:56.315-0500 [conn2] DBClientCursor::init call() failed
       m30999| 2013-12-20T14:12:56.315-0500 [WriteBackListener-localhost:30000] WriteBackListener exception : DBClientBase::findN: transport error: localhost:30000 ns: admin.$cmd query: { writebacklisten: ObjectId('52b496b5ebc1242f136c7597') }
       m30999| 2013-12-20T14:12:56.315-0500 [conn2] Detected bad connection created at 1387566773604578 microSec, clearing pool for localhost:30000 of 0 connections
      sh81742| {
      sh81742| 	"code" : 10276,
      sh81742| 	"ok" : 0,
      sh81742| 	"errmsg" : "exception: DBClientBase::findN: transport error: localhost:30000 ns: admin.$cmd query: { moveChunk: \"foo.bar\", from: \"localhost:30000\", to: \"localhost:30001\", fromShard: \"shard0000\", toShard: \"shard0001\", min: { _id: 0.0 }, max: { _id: 20.0 }, maxChunkSizeBytes: 52428800, shardId: \"foo.bar-_id_0.0\", configdb: \"localhost:29000\", secondaryThrottle: false, waitForDelete: true, maxTimeMS: 0 }"
      sh81742| }
      sh81742| assert failed
       m30001| 2013-12-20T14:12:56.317-0500 [migrateThread] DBClientCursor::init call() failed
      

      Versions tested (chronological order):

      6902c6b643f64 (not reproducible)
      99dff054c8b8 (when the behavior change was introduced)
      77384d0a36a2 (recent commit from 12-20-2013)

            Assignee:
            randolph@mongodb.com Randolph Tan
            Reporter:
            kamran.khan Kamran K.
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: