Loading...

XML

Word

Printable

JSON

Type: Improvement
Resolution: Won't Do
Priority: Major - P3
Fix Version/s: None
Affects Version/s: 3.0.9
Component/s: Sharding
Labels:
None

Assigned Teams:

Sharding NYC
Sprint:
Sharding 17 (07/15/16), Sharding 18 (08/05/16)
Case:
Confidence Status:
None
Work Order:
3

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name:
None
Goal Link:
None

Currently the 16Mb response size limit prevents sharding an existing large collection if doing so would generate too many initial split points. Changing splitVector to return a cursor should allow us to shard arbitrarily large existing collections without needing to increase the chunk size.

Original Summary: splitVector fails if the document containing the split points exceeds 16 Mb limit

Original Description: Generally that's the case when either initial split happens for a large collection and/or the chunks size is configured to be small.

The symptoms when it happens during autosplit are:

16-01-25T19:40:37.720-0500 W SHARDING [conn64] could not autosplit collection database1.col2 :: caused by :: 13345 splitVector command failed: { timeMillis: 2314, errmsg: "exception: BSONObj size: 20242689 (0x134E101) is invalid. Size must be between 0 and 16793600(16MB) First element: 0: { region: "Europe", randomP...", code: 10334, ok: 0.0, $gleStats: { lastOpTime: Timestamp 0|0, electionId: ObjectId('569684c288621bdbc786b963') } }

Can be reproduced manually as:

Insert a million of documents into a collection with default (implicitly created) _id values.

Run splitVector:

db.runCommand({splitVector: "test.mgendata", keyPattern: {_id:1}, maxChunkSizeBytes: 1*1024})
{
	"timeMillis" : 1127,
	"errmsg" : "exception: BSONObj size: 19888875 (0x12F7AEB) is invalid. Size must be between 0 and 16793600(16MB) First element: 0: { _id: ObjectId('56a088e086c70418eb3e75fa') }",
	"code" : 10334,
	"ok" : 0
}
> show log
<...>
2016-01-21T19:17:28.232+1100 I SHARDING [conn3] request split points lookup for chunk test.mgendata { : MinKey } -->> { : MaxKey }
2016-01-21T19:17:29.360+1100 W SHARDING [conn3] Finding the split vector for test.mgendata over { _id: 1.0 } keyCount: 2 numSplits: 666666 lookedAt: 2 took 1127ms
2016-01-21T19:17:29.470+1100 I -        [conn3] Assertion: 10334:BSONObj size: 19888875 (0x12F7AEB) is invalid. Size must be between 0 and 16793600(16MB) First element: 0: { _id: ObjectId('56a088e086c70418eb3e75fa') }
2016-01-21T19:17:29.477+1100 I CONTROL  [conn3]
 0xf840d2 0xf2b679 0xf0ef7e 0xf0f02c 0x8122eb 0x946664 0xe21f79 0x9c6e61 0x9c7e6c 0x9c8b3b 0xba5b9b 0xab2a1a 0x7ea855 0xf3ffa9 0x7f14e578e182 0x7f14e488f47d
----- BEGIN BACKTRACE -----
{"backtrace":[{"b":"400000","o":"B840D2","s":"_ZN5mongo15printStackTraceERSo"},{"b":"400000","o":"B2B679","s":"_ZN5mongo10logContextEPKc"},{"b":"400000","o":"B0EF7E","s":"_ZN5mongo11msgassertedEiPKc"},{"b":"400000","o":"B0F02C"},{"b":"400000","o":"4122EB","s":"_ZNK5mongo7BSONO
2016-01-21T19:17:29.491+1100 I COMMAND  [conn3] command test.$cmd command: splitVector { splitVector: "test.mgendata", keyPattern: { _id: 1.0 }, maxChunkSizeBytes: 1024.0 } keyUpdates:0 writeConflicts:0 numYields:15625 reslen:239 locks:{ Global: { acquireCount: { r: 31252 } }, Database: { acquireCount: { r: 15626 } }, Collection: { acquireCount: { r: 15626 } } } 1258ms

There are two problems with the existing behaviour:
1) The error message is misleading as it may be interpreted as corruption-related
2) The splitVector fails, while it should not be necessary as sharding should be able to carry on even if the list of split points returned is incomplete (truncated)

If we are only to address the first problem, we could make the error message to be more apparent:
“Could not split this chunk because it results in too many small chunks. Increase the max chunks size. For more info see …."

For problem (2) we could make splitVector to truncate the list of split points to ensure it fits into 16 Mb limit.

Assignee:: [DO NOT USE] Backlog - Sharding NYC
Reporter:: Dmitry Ryabtsev
Participants:: [DO NOT USE] Backlog - Sharding NYC, Andrew Ryder, Andy Schwerin, Dmitry Ryabtsev, Kaloian Manassiev, Max Hirschhorn
Votes:: 1 Vote for this issue
Watchers:: 21 Start watching this issue

Created:: Feb 11 2016 01:24:01 AM UTC
Updated:: Dec 06 2022 04:33:40 AM UTC
Resolved:: Nov 15 2021 04:08:32 PM UTC

Details

Description

Attachments

Activity

People

Dates