[SERVER-6385] chunk too big to move in authCommands2.js Created: 10/Jul/12  Updated: 11/Jul/16  Resolved: 30/Jul/12

Status: Closed
Project: Core Server
Component/s: Security
Affects Version/s: None
Fix Version/s: 2.2.0-rc1

Type: Bug Priority: Major - P3
Reporter: Ian Whalen (Inactive) Assignee: Greg Studer
Resolution: Done Votes: 0
Labels: buildbot
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Operating System: ALL
Participants:

 Description   

http://buildlogs.mongodb.org/build/4ffc79ced2a60f2137000453/test/4ffc9c85d2a60f667a000784/



 Comments   
Comment by auto [ 30/Jul/12 ]

Author:

{u'date': u'2012-07-30T08:28:50-07:00', u'email': u'greg@10gen.com', u'name': u'Greg Studer'}

Message: SERVER-6385 make sure balancing doesn't interfere with splitting being tested
Branch: master
https://github.com/mongodb/mongo/commit/e2172bee7f83d73243f715d6250ebcdd151983d7

Comment by Ian Whalen (Inactive) [ 19/Jul/12 ]

And it's back:

http://buildlogs.mongodb.org/build/5007a40cd2a60f43ab000a40/test/5007ee5bd2a60f42640007e5/

	
 m31100| Thu Jul 19 07:25:03 [conn24] command admin.$cmd command: { moveChunk: "test.foo", from: "test-rs0/localhost:31100,localhost:31101,localhost:31102", to: "test-rs1/localhost:31200,localhost:31201,localhost:31202", fromShard: "test-rs0", toShard: "test-rs1", min: { i: 0.0, j: 0.0 }, max: { i: 99.0, j: 9.0 }, maxChunkSizeBytes: 1048576, shardId: "test.foo-i_0.0j_0.0", configdb: "localhost:29000,localhost:29001,localhost:29002", secondaryThrottle: false, $auth: { local: { __system: 2 } } } ntoreturn:1 keyUpdates:0 locks(micros) r:959 reslen:109 385ms
	"cause" : {
		"chunkTooBig" : true,
		"estimatedChunkSize" : 8239752,
		"errmsg" : "chunk too big to move",
		"ok" : 0
	},
{
	"errmsg" : "move failed"
	"ok" : 0,
assert failed
Error("Printing Stack Trace")@:0
()@src/mongo/shell/utils.js:37
("assert failed")@src/mongo/shell/utils.js:58
(0)@src/mongo/shell/utils.js:66
([object DB],[object Object])@/mnt/data/slaves/Linux_64bit_Legacy_Nightly/mongo/jstests/sharding/authCommands2.js:78
(true)@/mnt/data/slaves/Linux_64bit_Legacy_Nightly/mongo/jstests/sharding/authCommands2.js:224
}
@/mnt/data/slaves/Linux_64bit_Legacy_Nightly/mongo/jstests/sharding/authCommands2.js:260
Thu Jul 19 07:25:03 uncaught exception: assert failed
 m29000| Thu Jul 19 07:25:03 got signal 15 (Terminated), will terminate after current cmd ends
failed to load: /mnt/data/slaves/Linux_64bit_Legacy_Nightly/mongo/jstests/sharding/authCommands2.js

Comment by Greg Studer [ 11/Jul/12 ]

Strange, think this may have been a thread starvation issue -

 m31100| Tue Jul 10 17:20:49 [conn24] created new distributed lock for test.foo on localhost:29000,localhost:29001,localhost:29002 ( lock timeout : 900000, ping interval : 30000, process : 0 )
 m31100| Tue Jul 10 17:20:49 [conn24] about to acquire distributed lock 'test.foo/domU-12-31-39-16-30-A2:31100:1341955249:638795049:
 m31100| Tue Jul 10 17:20:49 [conn24] distributed lock 'test.foo/domU-12-31-39-16-30-A2:31100:1341955249:638795049' acquired, ts : 4ffc9cb156258dc1f2a783bc
 m31100| Tue Jul 10 17:20:49 [conn24] distributed lock 'test.foo/domU-12-31-39-16-30-A2:31100:1341955249:638795049' unlocked. 
 m31100| Tue Jul 10 17:20:49 [conn24] created new distributed lock for test.foo on localhost:29000,localhost:29001,localhost:29002 ( lock timeout : 900000, ping interval : 30000, process : 0 )
 m31100| Tue Jul 10 17:20:49 [conn24] about to acquire distributed lock 'test.foo/domU-12-31-39-16-30-A2:31100:1341955249:638795049:
 m31100| Tue Jul 10 17:20:50 [conn24] distributed lock 'test.foo/domU-12-31-39-16-30-A2:31100:1341955249:638795049' acquired, ts : 4ffc9cb156258dc1f2a783bd
 m31100| Tue Jul 10 17:20:54 [conn24] distributed lock 'test.foo/domU-12-31-39-16-30-A2:31100:1341955249:638795049' unlocked. 
 m31100| Tue Jul 10 17:20:55 [conn24] runQuery called admin.$cmd { splitVector: "test.foo", keyPattern: { i: 1.0, j: 1.0 }, min: { i: 0.0, j: 0.0 }, max: { i: MaxKey, j: MaxKey }, force: true, $auth: { local: { __system: 2 } } }
 m31100| Tue Jul 10 17:20:55 [conn24] run command admin.$cmd { splitVector: "test.foo", keyPattern: { i: 1.0, j: 1.0 }, min: { i: 0.0, j: 0.0 }, max: { i: MaxKey, j: MaxKey }, force: true, $auth: { local: { __system: 2 } } }
 m31100| Tue Jul 10 17:20:55 [conn24] command admin.$cmd command: { splitVector: "test.foo", keyPattern: { i: 1.0, j: 1.0 }, min: { i: 0.0, j: 0.0 }, max: { i: MaxKey, j: MaxKey }, force: true, $auth: { local: { __system: 2 } } } ntoreturn:1 keyUpdates:0 locks(micros) r:1154 reslen:83 1ms
 m31100| Tue Jul 10 17:20:55 [conn24] about to acquire distributed lock 'test.foo/domU-12-31-39-16-30-A2:31100:1341955249:638795049:
 m31100| Tue Jul 10 17:20:55 [conn24] created new distributed lock for test.foo on localhost:29000,localhost:29001,localhost:29002 ( lock timeout : 900000, ping interval : 30000, process : 0 )
 m31100| Tue Jul 10 17:20:55 [conn24] distributed lock 'test.foo/domU-12-31-39-16-30-A2:31100:1341955249:638795049' acquired, ts : 4ffc9cb756258dc1f2a783be
 m31100| Tue Jul 10 17:20:55 [conn24] distributed lock 'test.foo/domU-12-31-39-16-30-A2:31100:1341955249:638795049' unlocked. 
 m31100| Tue Jul 10 17:20:55 [conn24] created new distributed lock for test.foo on localhost:29000,localhost:29001,localhost:29002 ( lock timeout : 900000, ping interval : 30000, process : 0 )
 m31100| Tue Jul 10 17:20:55 [conn24] about to acquire distributed lock 'test.foo/domU-12-31-39-16-30-A2:31100:1341955249:638795049:
 m31100| Tue Jul 10 17:20:55 [conn24] distributed lock 'test.foo/domU-12-31-39-16-30-A2:31100:1341955249:638795049' acquired, ts : 4ffc9cb756258dc1f2a783bf
 m31100| Tue Jul 10 17:20:55 [conn24] distributed lock 'test.foo/domU-12-31-39-16-30-A2:31100:1341955249:638795049' unlocked. 

The locking seems to be working as-designed here, though one thread holds it for a comparatively long time. Will look more at the test.

Comment by Spencer Brody (Inactive) [ 11/Jul/12 ]

I can't reproduce this and it seems like the build has already passed this test in its next run...
Not sure exactly what happened here - somehow the collection's metadata lock in the config servers never got released after a previous split or migration terminated, which stopped all splits from occurring, but I have no idea why. Greg, any ideas?

Generated at Thu Feb 08 03:11:31 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.