[SERVER-19919] Chunks that exceed 250000 docs but are under half chunk size get marked as jumbo Created: 13/Aug/15  Updated: 02/Aug/18  Resolved: 19/Oct/17

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 2.6.11, 3.0.5, 3.2.10, 3.5.13
Fix Version/s: 3.4.11, 3.6.0-rc1

Type: Bug Priority: Major - P3
Reporter: Andrew Ryder (Inactive) Assignee: Randolph Tan
Resolution: Done Votes: 8
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Duplicate
is duplicated by SERVER-20140 shard balancer fails to split chunks ... Closed
is duplicated by SERVER-24270 "chunk too big to move" and "chunk no... Closed
Related
related to DOCS-11934 Docs still reference the 250k limit t... Closed
related to SERVER-30572 Support configurable 'jumbo' chunk th... Closed
is related to SERVER-11701 Allow user to force moveChunk of chun... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v3.4
Steps To Reproduce:
  1. Create a sharded cluster with at least several shards and a couple of MongoS. I used 3 shards and 2 MongoS. Shard configuration does not matter. 1 or 3 config servers do not matter.
  2. Setup a collection:

    use test
    db.dropDatabase();
    sh.enableSharding("test");
    db.s.ensureIndex({_id:"hashed"});
    db.adminCommand({shardCollection:"test.s",{_id:"hashed"},{numInitialChunks:1})
    

    (numInitialChunks does not need to be 1 but it does make the issue occur much sooner)

  3. Insert small documents into the shard collection concurrently via at least two MongoS (the more MongoS involved the faster the issue occurs). I used this:

    function insert_until( count_target_num, id_start ) {
     assert( count_target_num > 0 );
     if( typeof id_start === 'undefined') {
      id_start = ( db.serverCmdLineOpts().parsed.net.port % 100 + 100 ) * 1000000;
     }
     assert( id_start > 0 );
     var targ_col = db.s;
     var target = count_target_num - targ_col.count();
     var new_doc = function() { return { _id: id_start++ } };
     while( target > 100 ) {
      // go fast...
      print( targ_col.count() );
      target = target / 10;
      if( target > 1000 )
       target = 1000;
      for( j = 0 ; j < target ; j++ ) {
       targ_col.insert( new_doc() );
      }
      target = count_target_num - targ_col.count();
     }
     while( targ_col.count() < count_target_num ) {
      targ_col.insert( new_doc() );
     }
     print( targ_col.count() );
    }
    

    Then on each of two shell sessions connected to different MongoS:

    insert_until(10000000)
    

  4. Watch for a jumbo chunk to turn up. For me it usually happened around the 3 million documents mark with this set up. Though it's sensitive to the timing of how the MongoS inserts are kicked off. If left to run, more and more chunks will be marked as jumbo.
Sprint: Sharding 2017-10-23
Participants:
Case:

 Description   

If a chunk that contains more than 250000 documents, but is less than half the chunk size, gets selected for a moveChunk operation, the balancer will mark the chunk as jumbo regardless of the shard key range available or the actual total data size of the chunk (so long as it is less than half the chunk size).

The issue can occur repeatedly in the same collection and could end up marking all chunks as jumbo. The resulting distribution is highly non-linear.
EDIT: I don't think it can mark every chunk as jumbo.

The issue is most likely to occur wherever large numbers of small documents are in use. The requirements are:

  • Documents that average less than 268 bytes in size (explained below), or less than 134 for certain earlier versions.
  • A shard key with a lot of range.
  • More than one MongoS, more is better (or perhaps worse?)
  • Inserts occurring across multiple MongoS (other ops don't matter) to the same sharded collection.
  • Balancer enabled.

The average document size threshold can be calculated for 3.2 or later as:

chunkSizeMB * 1024 * 1024 / 250000

For 3.0 and earlier the average document size threshold formula is:

chunkSizeMB * 1024 * 1024 / 2 / 250000

Note that if chunkSizeMB is raised, so is the threshold for average document sizes, under which the issue might manifest.

In these conditions, whenever a moveChunk occurs by the balancer attempting to bring the cluster into balance (for example, after some other split or a new shard is added for capacity), there is a chance of selecting a chunk whose document count actually exceeds 250000 (the separate mongos didn't collude to auto-split this before the limit was reached). 250000 is the hard-coded document count limit for a chunk before a move is rejected as "chunkTooBig". The balancer responds to such a failure by issuing a split request. The split returns a single split point (because the size is still under half the chunk size) and then SERVER-14052 kicks in and declares the split failed too. The chunk is marked as jumbo despite being tiny in size and having a potentially huge shard key range.

Note that in the repro steps, the numInitialChunks set to 1 is not required, although doing so makes the issue occur much sooner thanks to the initial imbalance.

Workaround

Users can lower the chunksize setting, and then clear the jumbo flags on all affected chunks to have them re-evaluated for splitting.



 Comments   
Comment by Githook User [ 09/Nov/17 ]

Author:

{'name': 'Randolph Tan', 'username': 'renctan', 'email': 'randolph@10gen.com'}

Message: SERVER-19919 Remove the 250000 document limit for migration

(cherry picked from commit 830fe9d2093b365d51aaafadfecefcdd09fd0ede)
Branch: v3.4
https://github.com/mongodb/mongo/commit/e6a8039356154e7d5fae606ec74bcbe6468c07f3

Comment by Githook User [ 19/Oct/17 ]

Author:

{'email': 'randolph@10gen.com', 'name': 'Randolph Tan', 'username': 'renctan'}

Message: SERVER-19919 Remove the 250000 document limit for migration
Branch: master
https://github.com/mongodb/mongo/commit/830fe9d2093b365d51aaafadfecefcdd09fd0ede

Comment by alplatonov [ 13/Jun/17 ]

I have collections with large and small document size, this bug affect us too. So i write automatic chunk split script for collections with small document size, but it resolve issue not good enough and the imbalance is still present.
Please take this task to work.

Comment by Andrew Ryder (Inactive) [ 29/Aug/15 ]

Hi Derek,

Lowering the chunksize is a persistent workaround, however if you don't want to do that, you can alternatively just clear the jumbo flags periodically (the second part of the stated workaround - once a week perhaps) on the affected collection to have the balancer re-assess those chunks. If those chunks have received more documents in the interim they may have reached the size threshold to be split, if not, they'll merely be marked as jumbo again. Either way, doing this periodically will allow the balancer to make progress without modifying the chunksize setting.

I realize this is not a solution, but until the issue can be addressed properly this may suffice as a workaround for you. I hope this helps.

Kind regards,
Andrew

Comment by Derek Wilson [ 28/Aug/15 ]

I can't use the work around as it will end up splitting chunks that are appropriately sized in other collections. Average document sizes in different collections can be from really tiny (low tens of bytes) to large (ten-ish kb) with hundreds of millions of docs each. There really isn't a good balance in my case.

Generated at Thu Feb 08 03:52:34 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.