[SERVER-14052] With only 2 distinct key values SplitVector Returns numSplits : 1 but no split is done Created: 26/May/14 Updated: 08/Apr/20 Resolved: 02/Jan/20 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | 2.6.1 |
| Fix Version/s: | 4.3.3 |
| Type: | Bug | Priority: | Minor - P4 |
| Reporter: | Adam Comerford | Assignee: | Tommaso Tocci |
| Resolution: | Done | Votes: | 0 |
| Labels: | ShardingRoughEdges, sharding-wfbf-day | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||||||||||||||
| Steps To Reproduce: | Start a test cluster, small chunk size:
Connect, create a test database, insert data, manually split to create the "problem" chunk:
Now, a chunk should exist with just two distinct shard key values (0 and 1). No matter how many documents I insert, no further splits happen on that chunk (inserted millions of docs). The mongos logs for the splitVector look something like this (repeated multiple times):
Here's sh.status():
A manual split will succeed and create the final possible 2 chunks:
|
||||||||||||||||||||||||||||||||
| Sprint: | Sharding 2019-12-30, Sharding 2020-01-13 | ||||||||||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||||||||||
| Case: | (copied to CRM) |
| Description |
|
This is something of an edge case, and it doesn't gain very much, but figured it is technically still a bug. While checking some related logic, I realized that once I created a chunk with just 2 distinct values (0 and 1 in the test case), no splits occurred even though one more split should be possible. A manual splitAt() is successful, so the split is still possible, but the autosplitter never seems to attempt it. |
| Comments |
| Comment by Githook User [ 02/Jan/20 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Author: {'name': 'Tommaso Tocci', 'email': 'tommaso.tocci@10gen.com', 'username': 'toto-dev'}Message: The internal splitVector function was accepting both
The first one has been removed because is obviusly redundant. The SplitVector command still accepts both of them and for compatibility | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Githook User [ 02/Jan/20 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Author: {'name': 'Tommaso Tocci', 'email': 'tommaso.tocci@10gen.com', 'username': 'toto-dev'}Message: SplitVector is always removing the first splitpoint from the splitvector | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Tommaso Tocci [ 18/Dec/19 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I figured out that currently the AutoSplitter will split a chunk only if the number of unique keys of its documents is grater or equal to three. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Tommaso Tocci [ 16/Dec/19 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I can confirm that the bug is still present on `r4.3.2`.
At this point you can see that no splits have been performed and we still have 2 chunks.
Even though we have 862.97MiB of total data, of which almost all of it is for the first chunk.
|