Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Done
Priority: Major - P3
Fix Version/s: 3.4.11, 3.6.0-rc1
Affects Version/s: 2.6.11, 3.0.5, 3.2.10, 3.5.13
Component/s: Sharding
Labels:
None

Backwards Compatibility:
Fully Compatible
Operating System:
ALL
Backport Requested:

v3.4
Steps To Reproduce:
Hide

Create a sharded cluster with at least several shards and a couple of MongoS. I used 3 shards and 2 MongoS. Shard configuration does not matter. 1 or 3 config servers do not matter.

Setup a collection:

use test db.dropDatabase(); sh.enableSharding("test"); db.s.ensureIndex({_id:"hashed"}); db.adminCommand({shardCollection:"test.s",{_id:"hashed"},{numInitialChunks:1})

(numInitialChunks does not need to be 1 but it does make the issue occur much sooner)

Insert small documents into the shard collection concurrently via at least two MongoS (the more MongoS involved the faster the issue occurs). I used this:

function insert_until( count_target_num, id_start ) { assert( count_target_num > 0 ); if( typeof id_start === 'undefined') { id_start = ( db.serverCmdLineOpts().parsed.net.port % 100 + 100 ) * 1000000; } assert( id_start > 0 ); var targ_col = db.s; var target = count_target_num - targ_col.count(); var new_doc = function() { return { _id: id_start++ } }; while( target > 100 ) { // go fast... print( targ_col.count() ); target = target / 10; if( target > 1000 ) target = 1000; for( j = 0 ; j < target ; j++ ) { targ_col.insert( new_doc() ); } target = count_target_num - targ_col.count(); } while( targ_col.count() < count_target_num ) { targ_col.insert( new_doc() ); } print( targ_col.count() ); }

Then on each of two shell sessions connected to different MongoS:

insert_until(10000000)

Watch for a jumbo chunk to turn up. For me it usually happened around the 3 million documents mark with this set up. Though it's sensitive to the timing of how the MongoS inserts are kicked off. If left to run, more and more chunks will be marked as jumbo.
Show
Create a sharded cluster with at least several shards and a couple of MongoS. I used 3 shards and 2 MongoS. Shard configuration does not matter. 1 or 3 config servers do not matter. Setup a collection: use test db.dropDatabase(); sh.enableSharding( "test" ); db.s.ensureIndex({_id: "hashed" }); db.adminCommand({shardCollection: "test.s" ,{_id: "hashed" },{numInitialChunks:1}) ( numInitialChunks does not need to be 1 but it does make the issue occur much sooner) Insert small documents into the shard collection concurrently via at least two MongoS (the more MongoS involved the faster the issue occurs). I used this: function insert_until( count_target_num, id_start ) { assert( count_target_num > 0 ); if ( typeof id_start === 'undefined' ) { id_start = ( db.serverCmdLineOpts().parsed.net.port % 100 + 100 ) * 1000000; } assert( id_start > 0 ); var targ_col = db.s; var target = count_target_num - targ_col.count(); var new_doc = function () { return { _id: id_start++ } }; while ( target > 100 ) { // go fast... print( targ_col.count() ); target = target / 10; if ( target > 1000 ) target = 1000; for ( j = 0 ; j < target ; j++ ) { targ_col.insert( new_doc() ); } target = count_target_num - targ_col.count(); } while ( targ_col.count() < count_target_num ) { targ_col.insert( new_doc() ); } print( targ_col.count() ); } Then on each of two shell sessions connected to different MongoS: insert_until(10000000) Watch for a jumbo chunk to turn up. For me it usually happened around the 3 million documents mark with this set up. Though it's sensitive to the timing of how the MongoS inserts are kicked off. If left to run, more and more chunks will be marked as jumbo.
Sprint:
Sharding 2017-10-23
Case:
Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

If a chunk that contains more than 250000 documents, but is less than half the chunk size, gets selected for a moveChunk operation, the balancer will mark the chunk as jumbo regardless of the shard key range available or the actual total data size of the chunk (so long as it is less than half the chunk size).

The issue can occur repeatedly in the same collection ~~and could end up marking all chunks as jumbo~~. The resulting distribution is highly non-linear.
EDIT: I don't think it can mark every chunk as jumbo.

The issue is most likely to occur wherever large numbers of small documents are in use. The requirements are:

Documents that average less than 268 bytes in size (explained below), or less than 134 for certain earlier versions.
A shard key with a lot of range.
More than one MongoS, more is better (or perhaps worse?)
Inserts occurring across multiple MongoS (other ops don't matter) to the same sharded collection.
Balancer enabled.

The average document size threshold can be calculated for 3.2 or later as:

chunkSizeMB * 1024 * 1024 / 250000

For 3.0 and earlier the average document size threshold formula is:

chunkSizeMB * 1024 * 1024 / 2 / 250000

Note that if chunkSizeMB is raised, so is the threshold for average document sizes, under which the issue might manifest.

In these conditions, whenever a moveChunk occurs by the balancer attempting to bring the cluster into balance (for example, after some other split or a new shard is added for capacity), there is a chance of selecting a chunk whose document count actually exceeds 250000 (the separate mongos didn't collude to auto-split this before the limit was reached). 250000 is the hard-coded document count limit for a chunk before a move is rejected as "chunkTooBig". The balancer responds to such a failure by issuing a split request. The split returns a single split point (because the size is still under half the chunk size) and then ~~SERVER-14052~~ kicks in and declares the split failed too. The chunk is marked as jumbo despite being tiny in size and having a potentially huge shard key range.

Note that in the repro steps, the numInitialChunks set to 1 is not required, although doing so makes the issue occur much sooner thanks to the initial imbalance.

Workaround

Users can lower the chunksize setting, and then clear the jumbo flags on all affected chunks to have them re-evaluated for splitting.

is duplicated by

SERVER-20140 shard balancer fails to split chunks with more than 250000 docs

Closed

SERVER-24270 "chunk too big to move" and "chunk not full enough to trigger auto-split"

Closed

is related to

SERVER-11701 Allow user to force moveChunk of chunks larger than the 25000 byte limit

Closed

related to

SERVER-30572 Support configurable 'jumbo' chunk threshold

Closed

Assignee:: Randolph Tan
Reporter:: Andrew Ryder (Inactive)
Participants:: alplatonov, Andrew Ryder, Derek Wilson, Githook User, Randolph Tan
Votes:: 8 Vote for this issue
Watchers:: 29 Start watching this issue

Created:: Aug 13 2015 07:35:35 AM UTC
Updated:: Aug 02 2018 04:02:46 PM UTC
Resolved:: Oct 19 2017 06:35:27 PM UTC

Details

Description

Workaround

Attachments

Issue Links

Forms

Activity

People

Dates