[SERVER-13806] Need better detection and reporting of the existence of jumbo chunks Created: 01/May/14  Updated: 06/Dec/22  Resolved: 03/Jan/19

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Jon Rangel (Inactive) Assignee: [DO NOT USE] Backlog - Sharding Team
Resolution: Won't Fix Votes: 1
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to SERVER-44088 Autosplitter seems to ignore some fat... Closed
related to SERVER-13024 Clear the 'jumbo' flag from the chunk... Blocked
is related to SERVER-21931 Only mark chunks as jumbo if split fa... Closed
Assigned Teams:
Sharding
Operating System: ALL
Steps To Reproduce:

sh.enableSharding("test");
sh.shardCollection("test.foo",{a:1});
sh.splitAt("test.foo",{a:1000});
for (var i=0; i<10000000; i++) {db.getSiblingDB("test").foo.insert({a:1, padding:"xxxxxxxx...xxxxx"})}

Run the above commands and observe that chunk MinKey -> 1000 is not marked as jumbo in the chunk meta data, even after it has grown past the max chunk size for the cluster.

Participants:

 Description   

As far as I can see, a jumbo chunk is only marked as such when the balancer has attempted to migrate it to another shard (Chunk::markAsJumbo is only called from Balancer::_moveChunks).

This means that there's no indication of jumbo chunks unless:

  • chunk imbalance has grown past the threshold at which the balancer tries to move chunks, and
  • the balancer has attempted to move a jumbo chunk to redress the imbalance

It would be good to have more comprehensive and early warning of jumbo chunks. This checking could perhaps be added to the splitVector handling in mongod.



 Comments   
Comment by Kaloian Manassiev [ 03/Jan/19 ]

Discovering jumbo-chunks proactively would require periodically scanning all the data in the collection, which is very resource intensive.

With the auto-splitter running on the shards now in 4.2, any jumbo chunks taking writes will be proactively marked as jumbo as well, so it is not just the balancer.

Comment by Greg Studer [ 09/Jul/14 ]

Not sure if the "jumbo" field should be used in this kind of debugging, it's really a balancer state and not a statistic. Too-large chunks are pretty verbose when they can't be split - but agree we need to make this info more visible (and probably improve the messages).

This might also be further addressed by some planned changes to send more metadata operation messages to the config servers.

Generated at Thu Feb 08 03:32:57 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.