Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-12178

cleanupOrphan can fail if shard is moving chunks

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.5.5
    • Component/s: Sharding
    • Labels:
    • Operating System:
      ALL

      Description

      Note: commit message is wrong. Forgot to change it before pushing.

      The fix for SERVER-11277 (99dff054c8b8) seems to cause moveChunk commands to fail with a transport error under certain circumstances. The attached JS file reproduces the problem.

        m30999| 2013-12-20T14:12:56.315-0500 [WriteBackListener-localhost:30000] DBClientCursor::init call() failed
       m30000| 2013-12-20T14:12:56.315-0500 [conn1] end connection 127.0.0.1:50886 (4 connections now open)
       m30000| 2013-12-20T14:12:56.315-0500 [conn3] end connection 127.0.0.1:50899 (4 connections now open)
       m30000| 2013-12-20T14:12:56.315-0500 [conn5] end connection 127.0.0.1:50910 (3 connections now open)
       m30001| 2013-12-20T14:12:56.315-0500 [conn5] end connection 127.0.0.1:50908 (4 connections now open)
       m30999| 2013-12-20T14:12:56.315-0500 [WriteBackListener-localhost:30000] Detected bad connection created at 1387566773596630 microSec, clearing pool for localhost:30000 of 0 connections
       m30999| 2013-12-20T14:12:56.315-0500 [conn2] DBClientCursor::init call() failed
       m30999| 2013-12-20T14:12:56.315-0500 [WriteBackListener-localhost:30000] WriteBackListener exception : DBClientBase::findN: transport error: localhost:30000 ns: admin.$cmd query: { writebacklisten: ObjectId('52b496b5ebc1242f136c7597') }
       m30999| 2013-12-20T14:12:56.315-0500 [conn2] Detected bad connection created at 1387566773604578 microSec, clearing pool for localhost:30000 of 0 connections
      sh81742| {
      sh81742| 	"code" : 10276,
      sh81742| 	"ok" : 0,
      sh81742| 	"errmsg" : "exception: DBClientBase::findN: transport error: localhost:30000 ns: admin.$cmd query: { moveChunk: \"foo.bar\", from: \"localhost:30000\", to: \"localhost:30001\", fromShard: \"shard0000\", toShard: \"shard0001\", min: { _id: 0.0 }, max: { _id: 20.0 }, maxChunkSizeBytes: 52428800, shardId: \"foo.bar-_id_0.0\", configdb: \"localhost:29000\", secondaryThrottle: false, waitForDelete: true, maxTimeMS: 0 }"
      sh81742| }
      sh81742| assert failed
       m30001| 2013-12-20T14:12:56.317-0500 [migrateThread] DBClientCursor::init call() failed


      Versions tested (chronological order):

      6902c6b643f64 (not reproducible)
      99dff054c8b8 (when the behavior change was introduced)
      77384d0a36a2 (recent commit from 12-20-2013)

        Attachments

          Issue Links

            Activity

              People

              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: