[SERVER-2724] splitVector counting off-by-one when calculating split points Created: 09/Mar/11  Updated: 10/Mar/11  Resolved: 10/Mar/11

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Minor - P4
Reporter: Greg Studer Assignee: Unassigned
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Participants:

 Description   

The splitVector function, in order to split a chunk, calculates the number of elements N which probably have a size in bytes of about 50% of the max chunk size. The function then counts off split points every *N + 1* keys to determine split points. This leads to additional elements included in each split chunk, which, at worst, increases the chunk size to 75% of the maximum chunk size (since the maximum element size is 16MB, and the maximum chunk size is 64MB).

Though the chunk boundaries are slightly skewed by this now, all chunk splits that normally would work should still work, again because of the 64MB vs 16MB ratio). If our document size ever increases however, (past 21 MB) this will not be the case, and for example, chunks larger than the max chunk size can be created. Cats and dogs living together, mass hysteria, etc.

The biggest effect of this so far is the maximum size of a collection to be sharded depends on the size of the elements inside. With max element size, 384GB collections can be sharded.



 Comments   
Comment by Greg Studer [ 10/Mar/11 ]

Effect of this is to go "high" on chunk size, assuming max doc size < chunk size this should not be a problem.

Generated at Thu Feb 08 03:01:01 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.