[SERVER-18361] Sharding "top" splits should respect tag boundaries Created: 07/May/15 Updated: 26/Sep/17 Resolved: 07/Oct/16 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | 2.6.9, 3.0.2, 3.2.10, 3.4.0-rc0 |
| Fix Version/s: | 3.4.0-rc1 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Ronan Bohan | Assignee: | Kaloian Manassiev |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||||||||||||||||||||||
| Issue Links: |
|
||||||||||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||||||||||
| Steps To Reproduce: | Run the attached script runme.sh
Note: You need 'mtools' installed and configured in order to run this script. I also find it useful to monitor the shard distribution as the script is running:
|
||||||||||||||||||||||||||||
| Sprint: | Sharding 2016-09-19, Sharding 2016-10-10, Sharding 2016-10-31 | ||||||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||||||
| Case: | (copied to CRM) | ||||||||||||||||||||||||||||
| Description |
|
When the special top/bottom splits are done it can result in a chunk which spans a tag range, leading to that chunk residing on the incorrect shard for (part of) the tag range. When the balancer runs it will then split the chunk to the tag range boundary and move the chunk to the tagged shard, but this leaves the chunk, which includes partial tag ranges, on a shard not allocated to that tag range temporarily. old description This issue described here is one in which documents are being inserted (all of which target a single Tag range) yet a split and move can occur during the insertion process in which the top chunk is moved to a non-aligned shard. New documents in this upper range can therefore end up on the wrong shard (though the balancer does seem to move the chunk to a valid shard shortly afterwards). The issue may occur because the command to create a tag range seems to cause a split at the lower end of the range but not at the top. Auto-splits which later occur can result in a split point being generated below the max key for that tag range, with the resulting top chunk (which contains a portion of the tag-range) being moved to an invalid shard. A workaround seems to be to manually create a split point at the top end of the tag range and then moving the resulting chunk (spanning the whole tag range) to an appropriate shard. The same thing could be accomplished by creating 'dummy' tag ranges for all sections outside of the real tag ranges, effectively creating tags spanning the entire range, from min to max. This will automatically create a split at the top of the real tag ranges (because they are also the bottom of the dummy tag ranges). In this case, the move will occur automatically (but it does take some time, i.e. 30+ seconds, for the move to occur) A repro script has been provided. It uses mtools to create a 4 shard cluster with 2 tag ranges (each tag is assigned to 2 shards). It then inserts data which targets a single tag range but there are brief periods (after a split) where the top chunk is moved to a shard which is not associated with that tag. The balancer will quickly realise this and move the chunk back - but this intermediate state can cause many issues, for example, it can mean expensive data transfers between different sites (in both directions). |
| Comments |
| Comment by Githook User [ 07/Oct/16 ] |
|
Author: {u'username': u'kaloianm', u'name': u'Kaloian Manassiev', u'email': u'kaloian.manassiev@mongodb.com'}Message: |
| Comment by Daniel Pasette (Inactive) [ 24/Aug/16 ] |
|
Added a new test which uses regular ShardingTest instead of mlaunch. |
| Comment by Ronan Bohan [ 08/May/15 ] |
|
Thanks Scott - I see your point. I have updated the title and description accordingly. Please let me know if it is clear and distinct enough from |
| Comment by Scott Hernandez (Inactive) [ 07/May/15 ] |
|
Okay, can you change the title and description to call out this specific case so it is clear what to do and is affected? |
| Comment by Ronan Bohan [ 07/May/15 ] |
|
Thanks Scott, Based on my reading of In the case referenced in this ticket ( If indeed |
| Comment by Scott Hernandez (Inactive) [ 07/May/15 ] |
|
Is there any part of this which is not a dup of |
| Comment by Ronan Bohan [ 07/May/15 ] |
|
For the record, the script runme.sh The end of the script searches the 'changelog' to find any chunks moved to the 4th shard (tagged with "eu-west-1"). In my tests I typically see 2 chunks being moved, each containing a subrange which includes "region": "us-east-1". See changelog.out |