[SERVER-20805] Sharding tests that move the same chunk multiple times should use '_waitForDelete: true' Created: 05/Oct/15  Updated: 15/Oct/15  Resolved: 07/Oct/15

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: 3.2.0-rc0

Type: Bug Priority: Major - P3
Reporter: Daniel Pasette (Inactive) Assignee: Kaloian Manassiev
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Backwards Compatibility: Fully Compatible
Operating System: ALL
Sprint: Sharding A (10/09/15)
Participants:

 Description   

A shard may not be a donor or recipient of a chunk if there are any outstanding deletes of previous migration. That way, tests, which move chunks back and forth multiple times may occasionally fail. Such tests should be made to pass _waitForDelete: true, so that the move chunk operation will wait for the range to be deleted.

Task Failure
Test Log

Only seen this once so far.

[js_test:find_and_modify_after_multi_write] 2015-10-03T15:06:07.088+0000 s20022| 2015-10-03T15:06:07.088+0000 I SHARDING [conn1] moveChunk result: { cause: { ok: 0.0, errmsg: "can't accept new chunks because  there are still 1 deletes from previous migration", code: 125 }, ok: 0.0, errmsg: "moveChunk failed to engage TO-shard in the data transfer:  :: caused by :: can't accept new chunks because  there are still 1 deletes from previous migration", code: 125 }
[js_test:find_and_modify_after_multi_write] 2015-10-03T15:06:07.151+0000 assert: [null] != [{ "_id" : ObjectId("560feedf3a395ff60c181637"), "x" : 200, "y" : 1 }] are not equal : undefined
[js_test:find_and_modify_after_multi_write] 2015-10-03T15:06:07.151+0000 doassert@src/mongo/shell/assert.js:15:14
[js_test:find_and_modify_after_multi_write] 2015-10-03T15:06:07.151+0000 assert.eq@src/mongo/shell/assert.js:43:5
[js_test:find_and_modify_after_multi_write] 2015-10-03T15:06:07.151+0000 runTest@jstests\sharding\find_and_modify_after_multi_write.js:48:1
[js_test:find_and_modify_after_multi_write] 2015-10-03T15:06:07.151+0000 @jstests\sharding\find_and_modify_after_multi_write.js:76:1
[js_test:find_and_modify_after_multi_write] 2015-10-03T15:06:07.151+0000 @jstests\sharding\find_and_modify_after_multi_write.js:1:2
[js_test:find_and_modify_after_multi_write] 2015-10-03T15:06:07.151+0000 
[js_test:find_and_modify_after_multi_write] 2015-10-03T15:06:07.151+0000 2015-10-03T15:06:07.152+0000 E QUERY    [thread1] Error: [null] != [{ "_id" : ObjectId("560feedf3a395ff60c181637"), "x" : 200, "y" : 1 }] are not equal : undefined :
[js_test:find_and_modify_after_multi_write] 2015-10-03T15:06:07.153+0000 doassert@src/mongo/shell/assert.js:15:14
[js_test:find_and_modify_after_multi_write] 2015-10-03T15:06:07.153+0000 assert.eq@src/mongo/shell/assert.js:43:5
[js_test:find_and_modify_after_multi_write] 2015-10-03T15:06:07.153+0000 runTest@jstests\sharding\find_and_modify_after_multi_write.js:48:1
[js_test:find_and_modify_after_multi_write] 2015-10-03T15:06:07.153+0000 @jstests\sharding\find_and_modify_after_multi_write.js:76:1
[js_test:find_and_modify_after_multi_write] 2015-10-03T15:06:07.153+0000 @jstests\sharding\find_and_modify_after_multi_write.js:1:2
[js_test:find_and_modify_after_multi_write] 2015-10-03T15:06:07.153+0000 
[js_test:find_and_modify_after_multi_write] 2015-10-03T15:06:07.153+0000 failed to load: jstests\sharding\find_and_modify_after_multi_write.js



 Comments   
Comment by Githook User [ 07/Oct/15 ]

Author:

{u'username': u'kaloianm', u'name': u'Kaloian Manassiev', u'email': u'kaloian.manassiev@mongodb.com'}

Message: SERVER-20805 Add '_waitForDelete: true' to some sharding tests

These are tests, which make multiple moveChunk calls. Adding
_waitForDelete:true ensures that the previous moveChunk would have
completed.
Branch: master
https://github.com/mongodb/mongo/commit/3b2bd2cb97a39d02e241d88842ad0fe37e5a8661

Comment by Andy Schwerin [ 05/Oct/15 ]

The proximate cause of this failure is that a moveChunk command failed because a prior one hadn't finished cleaning up.

In particular, this moveChunk failed, but we didn't assert.commandWorked its result, so we end up seeing the problem when we confirm that the document has changed/moved appropriately, later on. All of the commands in tests should really have their results checked.

That said, this would only have exposed the error more clearly, not fixed it. We need to wait for cleanup to complete on the prior moveChunk before starting the subsequent one,presumably by setting _waitForDelete: 1 on all the moveChunk operations in this test.

Comment by Daniel Pasette (Inactive) [ 05/Oct/15 ]

Assigning to Andy for triage and distribution.

Generated at Thu Feb 08 03:55:20 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.