Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-19919

Chunks that exceed 250000 docs but are under half chunk size get marked as jumbo

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Fixed
    • Affects Version/s: 2.6.11, 3.0.5, 3.2.10, 3.5.13
    • Fix Version/s: 3.4.11, 3.6.0-rc1
    • Component/s: Sharding
    • Labels:
      None
    • Backwards Compatibility:
      Fully Compatible
    • Operating System:
      ALL
    • Backport Requested:
      v3.4
    • Steps To Reproduce:
      Hide
      1. Create a sharded cluster with at least several shards and a couple of MongoS. I used 3 shards and 2 MongoS. Shard configuration does not matter. 1 or 3 config servers do not matter.
      2. Setup a collection:

        use test
        db.dropDatabase();
        sh.enableSharding("test");
        db.s.ensureIndex({_id:"hashed"});
        db.adminCommand({shardCollection:"test.s",{_id:"hashed"},{numInitialChunks:1})
        

        (numInitialChunks does not need to be 1 but it does make the issue occur much sooner)

      3. Insert small documents into the shard collection concurrently via at least two MongoS (the more MongoS involved the faster the issue occurs). I used this:

        function insert_until( count_target_num, id_start ) {
         assert( count_target_num > 0 );
         if( typeof id_start === 'undefined') {
          id_start = ( db.serverCmdLineOpts().parsed.net.port % 100 + 100 ) * 1000000;
         }
         assert( id_start > 0 );
         var targ_col = db.s;
         var target = count_target_num - targ_col.count();
         var new_doc = function() { return { _id: id_start++ } };
         while( target > 100 ) {
          // go fast...
          print( targ_col.count() );
          target = target / 10;
          if( target > 1000 )
           target = 1000;
          for( j = 0 ; j < target ; j++ ) {
           targ_col.insert( new_doc() );
          }
          target = count_target_num - targ_col.count();
         }
         while( targ_col.count() < count_target_num ) {
          targ_col.insert( new_doc() );
         }
         print( targ_col.count() );
        }
        

        Then on each of two shell sessions connected to different MongoS:

        insert_until(10000000)
        

      4. Watch for a jumbo chunk to turn up. For me it usually happened around the 3 million documents mark with this set up. Though it's sensitive to the timing of how the MongoS inserts are kicked off. If left to run, more and more chunks will be marked as jumbo.
      Show
      Create a sharded cluster with at least several shards and a couple of MongoS. I used 3 shards and 2 MongoS. Shard configuration does not matter. 1 or 3 config servers do not matter. Setup a collection: use test db.dropDatabase(); sh.enableSharding( "test" ); db.s.ensureIndex({_id: "hashed" }); db.adminCommand({shardCollection: "test.s" ,{_id: "hashed" },{numInitialChunks:1}) ( numInitialChunks does not need to be 1 but it does make the issue occur much sooner) Insert small documents into the shard collection concurrently via at least two MongoS (the more MongoS involved the faster the issue occurs). I used this: function insert_until( count_target_num, id_start ) { assert( count_target_num > 0 ); if ( typeof id_start === 'undefined' ) { id_start = ( db.serverCmdLineOpts().parsed.net.port % 100 + 100 ) * 1000000; } assert( id_start > 0 ); var targ_col = db.s; var target = count_target_num - targ_col.count(); var new_doc = function () { return { _id: id_start++ } }; while ( target > 100 ) { // go fast... print( targ_col.count() ); target = target / 10; if ( target > 1000 ) target = 1000; for ( j = 0 ; j < target ; j++ ) { targ_col.insert( new_doc() ); } target = count_target_num - targ_col.count(); } while ( targ_col.count() < count_target_num ) { targ_col.insert( new_doc() ); } print( targ_col.count() ); } Then on each of two shell sessions connected to different MongoS: insert_until(10000000) Watch for a jumbo chunk to turn up. For me it usually happened around the 3 million documents mark with this set up. Though it's sensitive to the timing of how the MongoS inserts are kicked off. If left to run, more and more chunks will be marked as jumbo.
    • Sprint:
      Sharding 2017-10-23
    • Case:

      Description

      If a chunk that contains more than 250000 documents, but is less than half the chunk size, gets selected for a moveChunk operation, the balancer will mark the chunk as jumbo regardless of the shard key range available or the actual total data size of the chunk (so long as it is less than half the chunk size).

      The issue can occur repeatedly in the same collection and could end up marking all chunks as jumbo. The resulting distribution is highly non-linear.
      EDIT: I don't think it can mark every chunk as jumbo.

      The issue is most likely to occur wherever large numbers of small documents are in use. The requirements are:

      • Documents that average less than 268 bytes in size (explained below), or less than 134 for certain earlier versions.
      • A shard key with a lot of range.
      • More than one MongoS, more is better (or perhaps worse?)
      • Inserts occurring across multiple MongoS (other ops don't matter) to the same sharded collection.
      • Balancer enabled.

      The average document size threshold can be calculated for 3.2 or later as:

      chunkSizeMB * 1024 * 1024 / 250000
      

      For 3.0 and earlier the average document size threshold formula is:

      chunkSizeMB * 1024 * 1024 / 2 / 250000
      

      Note that if chunkSizeMB is raised, so is the threshold for average document sizes, under which the issue might manifest.

      In these conditions, whenever a moveChunk occurs by the balancer attempting to bring the cluster into balance (for example, after some other split or a new shard is added for capacity), there is a chance of selecting a chunk whose document count actually exceeds 250000 (the separate mongos didn't collude to auto-split this before the limit was reached). 250000 is the hard-coded document count limit for a chunk before a move is rejected as "chunkTooBig". The balancer responds to such a failure by issuing a split request. The split returns a single split point (because the size is still under half the chunk size) and then SERVER-14052 kicks in and declares the split failed too. The chunk is marked as jumbo despite being tiny in size and having a potentially huge shard key range.

      Note that in the repro steps, the numInitialChunks set to 1 is not required, although doing so makes the issue occur much sooner thanks to the initial imbalance.

      Workaround

      Users can lower the chunksize setting, and then clear the jumbo flags on all affected chunks to have them re-evaluated for splitting.

        Attachments

          Issue Links

            Activity

              People

              • Votes:
                8 Vote for this issue
                Watchers:
                29 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: