[SERVER-11122] moveChunk fails in sharding/hash_basic.js on slow hosts Created: 10/Oct/13  Updated: 11/Jul/16  Resolved: 10/Oct/13

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: 2.5.3

Type: Bug Priority: Major - P3
Reporter: Benety Goh Assignee: Benety Goh
Resolution: Done Votes: 0
Labels: Windows, sharding
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Windows 32-bit


Issue Links:
Related
related to SERVER-10232 better migration stats to detect TO-s... Closed
related to SERVER-11301 moveChunk command failing in some sha... Closed
Operating System: Windows
Steps To Reproduce:

Recurring failure in Windows 32-bit build since 10/9

Participants:

 Description   

http://buildlogs.mongodb.org/mci_0.9_windows_32/builds/426/test/sharding_0/hash_basic.js

m30000| 2013-10-10T12:10:45.720+0000 [conn7] moveChunk request accepted at version 4|1||52569943902bd79d28635866
m30000| 2013-10-10T12:10:45.721+0000 [conn7] moveChunk number of documents: 250
m30000| 2013-10-10T12:10:45.724+0000 [conn7] warning: moveChunk failed to engage TO-shard in the data transfer: migrate already in progress
m30000| 2013-10-10T12:10:45.724+0000 [conn7] MigrateFromStatus::done About to acquire global write lock to exit critical section
m30000| 2013-10-10T12:10:45.724+0000 [conn7] MigrateFromStatus::done Global lock acquired
m30001| 2013-10-10T12:10:45.722+0000 [conn5] run command admin.$cmd { _recvChunkStart: "test.user", from: "localhost:30000", min:

{ x: -4611686018427387902 }

, max:

{ x: 0 }

, shardKeyPattern:

{ x: "hashed" }

, configServer: "localhost:30000", secondaryThrottle: false }
m30001| 2013-10-10T12:10:45.722+0000 [conn5] command admin.$cmd command: { _recvChunkStart: "test.user", from: "localhost:30000", min:

{ x: -4611686018427387902 }

, max:

{ x: 0 }

, shardKeyPattern:

{ x: "hashed" }

, configServer: "localhost:30000", secondaryThrottle: false } ntoreturn:1 keyUpdates:0 reslen:77 0ms
m30000| 2013-10-10T12:10:45.724+0000 [conn13] running multiple plans
m30000| 2013-10-10T12:10:45.725+0000 [conn13] update config.locks query:

{ _id: "test.user", ts: ObjectId('52569945cd80328228514786') }

update: { $set:

{ state: 0 }

} nscanned:1 nupdated:1 fastmod:1 keyUpdates:0 locks(micros) w:238 0ms
m30000| 2013-10-10T12:10:45.725+0000 [conn13] run command admin.$cmd

{ getlasterror: 1 }

m30000| 2013-10-10T12:10:45.725+0000 [conn13] command admin.$cmd command:

{ getlasterror: 1 }

ntoreturn:1 keyUpdates:0 reslen:85 0ms
m30000| 2013-10-10T12:10:45.725+0000 [conn7] distributed lock 'test.user/EC2AMAZ-ZVRDQB5:30000:1381407044:41' unlocked.
m30000| 2013-10-10T12:10:45.725+0000 [conn7] about to log metadata event: { _id: "EC2AMAZ-ZVRDQB5-2013-10-10T12:10:45-52569945cd80328228514788", server: "EC2AMAZ-ZVRDQB5", clientAddr: "127.0.0.1:63707", time: new Date(1381407045725), what: "moveChunk.from", ns: "test.user", details: { min:

{ x: -4611686018427387902 }

, max:

{ x: 0 }

, step 1 of 6: 0, step 2 of 6: 7, note: "aborted" } }
m30000| 2013-10-10T12:10:45.725+0000 [conn7] command admin.$cmd command: { moveChunk: "test.user", from: "localhost:30000", to: "localhost:30001", fromShard: "shard0000", toShard: "shard0001", min:

{ x: -4611686018427387902 }

, max:

{ x: 0 }

, maxChunkSizeBytes: 52428800, shardId: "test.user-x_-4611686018427387902", configdb: "localhost:30000", secondaryThrottle: false, waitForDelete: true } ntoreturn:1 keyUpdates:0 locks(micros) W:39 r:853 w:6 reslen:199 12ms
m30000| 2013-10-10T12:10:45.726+0000 [conn10] insert config.changelog ninserted:1 keyUpdates:0 locks(micros) w:85 0ms
m30999| 2013-10-10T12:10:45.726+0000 [conn1] moveChunk result: { cause:

{ ok: 0.0, errmsg: "migrate already in progress" }

, ok: 0.0, errmsg: "moveChunk failed to engage TO-shard in the data transfer: migrate already in progress" }
m30000| 2013-10-10T12:10:45.726+0000 [conn3] query config.chunks query: { query:

{ ns: "test.user" }

, orderby:

{ lastmod: -1 }

} ntoreturn:1 ntoskip:0 nscanned:1 keyUpdates:0 locks(micros) r:145 nreturned:1 reslen:191 0ms
assert failed : Cmd failed: {
"cause" : {
"cause" :

{ "ok" : 0, "errmsg" : "migrate already in progress" }

,
"ok" : 0,
"errmsg" : "moveChunk failed to engage TO-shard in the data transfer: migrate already in progress"
},
"ok" : 0,
"errmsg" : "move failed"
}
Error: Printing Stack Trace
at printStackTrace (src/mongo/shell/utils.js:38:15)
at doassert (src/mongo/shell/assert.js:6:5)
at assert (src/mongo/shell/assert.js:14:5)
at D:\data\mci\git@github.commongodb\mongo.git\master\jstests\sharding\hash_basic.js:80:5
at Array.forEach (native)
at D:\data\mci\git@github.commongodb\mongo.git\master\jstests\sharding\hash_basic.js:74:11



 Comments   
Comment by Githook User [ 02/Dec/13 ]

Author:

{u'username': u'renctan', u'name': u'Randolph Tan', u'email': u'randolph@10gen.com'}

Message: SERVER-10232 better migration stats to detect TO-side readiness

Undo temporarly band aids by SERVER-11122 and SERVER-11301
Branch: master
https://github.com/mongodb/mongo/commit/ec176dd577988fc1aa2366de8152c5e97bb95466

Comment by auto [ 10/Oct/13 ]

Author:

{u'username': u'benety', u'name': u'Benety Goh', u'email': u'benety@mongodb.com'}

Message: SERVER-11122 wrap move chunk admin command in a retry loop to handle case when destination shard is not ready
Branch: master
https://github.com/mongodb/mongo/commit/5bdf45f7e8d5c8b5645d8a20b4a94a331801d8df

Generated at Thu Feb 08 03:24:57 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.