[SERVER-25602] splitChunk command with out of bound splitKeys fails, but still updates the chunks Created: 15/Aug/16  Updated: 05/Mar/18  Resolved: 16/Aug/16

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 3.0.12, 3.2.8, 3.3.11
Fix Version/s: 3.2.10, 3.3.12

Type: Bug Priority: Critical - P2
Reporter: Linda Qin Assignee: Kaloian Manassiev
Resolution: Done Votes: 0
Labels: bkp, code-and-test
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File split_chunk_out_of_bound.js    
Issue Links:
Backports
Related
related to SERVER-24569 Maintain the 'rangesToClean' and 'met... Closed
is related to SERVER-25630 Add validation of splitVector output Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Completed:
Backport Requested:
v3.0
Participants:

 Description   

Run the splitChunk command with out of bound splitKeys, the command fails. But it still updates the chunks, hence results in corrupted config database with overlap chunks and a chunk with reverse order.

We've tested 3.0.12, 3.2.8, 3.2.9-rc1, 3.3.10 and 3.3.11. All have the same issue.

The chunks before running the splitChunk command:

BEFORE: [
	{
		"_id" : "test.user-x_MinKey",
		"lastmod" : Timestamp(1, 1),
		"lastmodEpoch" : ObjectId("57b1229178e732ea6489534f"),
		"ns" : "test.user",
		"min" : {
			"x" : { "$minKey" : 1 }
		},
		"max" : {
			"x" : 0
		},
		"shard" : "shard0000"
	},
	{
		"_id" : "test.user-x_0.0",
		"lastmod" : Timestamp(1, 2),
		"lastmodEpoch" : ObjectId("57b1229178e732ea6489534f"),
		"ns" : "test.user",
		"min" : {
			"x" : 0
		},
		"max" : {
			"x" : { "$maxKey" : 1 }
		},
		"shard" : "shard0000"
	}
]

The chunks after running the splitChunk command:

AFTER: [
	{
		"_id" : "test.user-x_MinKey",
		"lastmod" : Timestamp(1, 3),
		"lastmodEpoch" : ObjectId("57b11ede5ed71a434707b87e"),
		"ns" : "test.user",
		"min" : {
			"x" : { "$minKey" : 1 }
		},
		"max" : {
			"x" : 2
		},
		"shard" : "shard0000"
	},
	{
		"_id" : "test.user-x_0.0",
		"lastmod" : Timestamp(1, 2),
		"lastmodEpoch" : ObjectId("57b11ede5ed71a434707b87e"),
		"ns" : "test.user",
		"min" : {
			"x" : 0
		},
		"max" : {
			"x" : { "$maxKey" : 1 }
		},
		"shard" : "shard0000"
	},
	{
		"_id" : "test.user-x_2.0",
		"lastmod" : Timestamp(1, 4),
		"lastmodEpoch" : ObjectId("57b11ede5ed71a434707b87e"),
		"ns" : "test.user",
		"min" : {
			"x" : 2
		},
		"max" : {
			"x" : 0
		},
		"shard" : "shard0000"
	}
]

jstest attached. It fails with:

assert: [2] != [3] are not equal : Split chunks failed, but the chunks were updated in the config database
doassert@src/mongo/shell/assert.js:15:14
assert.eq@src/mongo/shell/assert.js:51:5
@split_chunk_out_of_bound.js:36:1
2016-08-15T02:11:18.422+0000 E QUERY    [thread1] Error: [2] != [3] are not equal : Split chunks failed, but the chunks were updated in the config database :
doassert@src/mongo/shell/assert.js:15:14
assert.eq@src/mongo/shell/assert.js:51:5
@split_chunk_out_of_bound.js:36:1
failed to load: split_chunk_out_of_bound.js

Also, for 3.3.11 the primary shard fasserts (SERVER-24569) after config.chunks has been modified:

d20000| 2016-08-15T12:01:53.514+1000 I -        [conn1] Fatal assertion 40221 IllegalOperation: cannot split chunk [{ x: MinKey }, { x: 0.0 }) at key { x: 2.0 } at src/mongo/db/s/split_chunk_command.cpp 436
d20000| 2016-08-15T12:01:53.514+1000 I -        [conn1] 
d20000| 
d20000| ***aborting after fassert() failure
d20000| 
d20000| 
d20000| 2016-08-15T12:01:53.523+1000 F -        [conn1] Got signal: 6 (Abort trap: 6).

fassert is not an apporpriate response to the server receiving a command with bad parameters. The command should just fail and return an error (with no side effects).



 Comments   
Comment by Githook User [ 17/Aug/16 ]

Author:

{u'username': u'kaloianm', u'name': u'Kaloian Manassiev', u'email': u'kaloian.manassiev@mongodb.com'}

Message: SERVER-25602 Make split/mergeChunks commands check the validity of input
Branch: v3.2
https://github.com/mongodb/mongo/commit/08fbb333bc11fba2b9df71a4262da5e18ed00d47

Comment by Githook User [ 16/Aug/16 ]

Author:

{u'username': u'kaloianm', u'name': u'Kaloian Manassiev', u'email': u'kaloian.manassiev@mongodb.com'}

Message: SERVER-25602 Make split/mergeChunks commands check the validity of input
Branch: master
https://github.com/mongodb/mongo/commit/51fe71a4169d8ac01dca915b482c94e228d8a746

Comment by Kaloian Manassiev [ 15/Aug/16 ]

Presently, the shards (mongod) do not validate the input of the splitChunk command, and in particular do not check that the split points fall within the range of the chunk. This is because the splitChunk, which is an internal administrative command, assumes that its only caller is mongos.

The issue described in this ticket can only happen in two cases:

  • The output of the splitVector command, which is the input to the splitChunk, is somehow incorrect. After code inspection we haven't been able to find a bug in splitVector, but we'll keep looking.
  • The splitChunk command was called manually with incorrect input.

We are going to use this ticket to tighten the checks which the shards perform in order to ensure that incorrect metadata is never written in a case like this, and produce an error message instead.

Please continue to watch the ticket for updates.

Best regards,
-Kal.

Comment by Adam Flynn [ 15/Aug/16 ]

Thanks for the quick turnaround on this. It caused serious production issues for us (essentially lost read availability on a very large collection for 2 days).

I know timing's tight, but we'd really appreciate if this fix ships with 3.2.9.

Generated at Thu Feb 08 04:09:39 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.