[SERVER-18361] Sharding "top" splits should respect tag boundaries Created: 07/May/15  Updated: 26/Sep/17  Resolved: 07/Oct/16

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 2.6.9, 3.0.2, 3.2.10, 3.4.0-rc0
Fix Version/s: 3.4.0-rc1

Type: Bug Priority: Major - P3
Reporter: Ronan Bohan Assignee: Kaloian Manassiev
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File changelog.out     File runme.sh     File top_chunk_zone.js    
Issue Links:
Depends
is depended on by SERVER-26400 balance_tags1.js expects incorrect ch... Closed
Duplicate
duplicates SERVER-6640 Strict balancing guarantees with shar... Closed
Related
is related to SERVER-7668 Split chunks on tag ranges Closed
is related to SERVER-14506 special top chunk logic can move max ... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Steps To Reproduce:

Run the attached script runme.sh:

bash runme.sh

Note: You need 'mtools' installed and configured in order to run this script.

I also find it useful to monitor the shard distribution as the script is running:

watch -n 0.2 'mongo mydb --quiet --eval "db.mycoll.getShardDistribution()"'

Sprint: Sharding 2016-09-19, Sharding 2016-10-10, Sharding 2016-10-31
Participants:
Case:

 Description   

When the special top/bottom splits are done it can result in a chunk which spans a tag range, leading to that chunk residing on the incorrect shard for (part of) the tag range. When the balancer runs it will then split the chunk to the tag range boundary and move the chunk to the tagged shard, but this leaves the chunk, which includes partial tag ranges, on a shard not allocated to that tag range temporarily.

old description
When using tag-aware sharding there are circumstances where a chunk will be moved to the wrong shard, i.e. chunks containing a (sub) range of a given tag can be moved to shards which are not associated with that tag.

This issue described here is one in which documents are being inserted (all of which target a single Tag range) yet a split and move can occur during the insertion process in which the top chunk is moved to a non-aligned shard. New documents in this upper range can therefore end up on the wrong shard (though the balancer does seem to move the chunk to a valid shard shortly afterwards).

The issue may occur because the command to create a tag range seems to cause a split at the lower end of the range but not at the top. Auto-splits which later occur can result in a split point being generated below the max key for that tag range, with the resulting top chunk (which contains a portion of the tag-range) being moved to an invalid shard.

A workaround seems to be to manually create a split point at the top end of the tag range and then moving the resulting chunk (spanning the whole tag range) to an appropriate shard.

The same thing could be accomplished by creating 'dummy' tag ranges for all sections outside of the real tag ranges, effectively creating tags spanning the entire range, from min to max. This will automatically create a split at the top of the real tag ranges (because they are also the bottom of the dummy tag ranges). In this case, the move will occur automatically (but it does take some time, i.e. 30+ seconds, for the move to occur)


A repro script has been provided. It uses mtools to create a 4 shard cluster with 2 tag ranges (each tag is assigned to 2 shards). It then inserts data which targets a single tag range but there are brief periods (after a split) where the top chunk is moved to a shard which is not associated with that tag. The balancer will quickly realise this and move the chunk back - but this intermediate state can cause many issues, for example, it can mean expensive data transfers between different sites (in both directions).



 Comments   
Comment by Githook User [ 07/Oct/16 ]

Author:

{u'username': u'kaloianm', u'name': u'Kaloian Manassiev', u'email': u'kaloian.manassiev@mongodb.com'}

Message: SERVER-18361 Consider the zone's max when enforcing zone boundaries
Branch: master
https://github.com/mongodb/mongo/commit/6e8a3d683182f0ce68b3a19547a2fb00d3909b19

Comment by Daniel Pasette (Inactive) [ 24/Aug/16 ]

Added a new test which uses regular ShardingTest instead of mlaunch.

Comment by Ronan Bohan [ 08/May/15 ]

Thanks Scott - I see your point. I have updated the title and description accordingly. Please let me know if it is clear and distinct enough from SERVER-6640 at this point.

Comment by Scott Hernandez (Inactive) [ 07/May/15 ]

Okay, can you change the title and description to call out this specific case so it is clear what to do and is affected?

Comment by Ronan Bohan [ 07/May/15 ]

Thanks Scott,

Based on my reading of SERVER-6640 it looks like it is talking about two documents which map to two separate tags. In certain circumstances both documents go to the same chunk which means at least one of the documents will live on the wrong shard (until a suitable split occurs).

In the case referenced in this ticket (SERVER-18361) we are dealing with a set of documents which all map to the same tag. What's interesting is that they are initially inserted into a chunk on a valid shard, but when a split happens the resultant 'top chunk' can move to an invalid shard. So this results in a cluster where data is going to the right shards for a period of time but later on similar documents go to the wrong shard, only to be moved back later by the balancer.

If indeed SERVER-6640 is related it is perhaps trying to address a more general, higher level problem than this one. On the other hand, this ticket may potentially be considered an incomplete fix to SERVER-7668 (now marked as closed / fixed), which deals explicitly with chunks being split on tag range boundaries.

Comment by Scott Hernandez (Inactive) [ 07/May/15 ]

Is there any part of this which is not a dup of SERVER-6640? If so, please add a note/comment there and/or here.

Comment by Ronan Bohan [ 07/May/15 ]

For the record, the script runme.sh creates 4 shards, the first two of which are tagged with 'us-east-1', the second two with 'eu-west-1'. Tag ranges are defined such that documents containing "region": "us-east-1" go to "us-east-1" tagged shards and documents containing "region": "eu-west-1" go to "eu-west-1" tagged shards. Data is inserted using only "region": "us-east-1" but in my tests after ~40k documents are inserted (and again at ~80k documents) a split/move occurs and the new top chunk goes to a non-us-east-1 tag. (Note: there is a compound shard key - region:1, context:1 - so the splits are effectively defined by the context field)

The end of the script searches the 'changelog' to find any chunks moved to the 4th shard (tagged with "eu-west-1"). In my tests I typically see 2 chunks being moved, each containing a subrange which includes "region": "us-east-1". See changelog.out for a dump of the complete 'changelog' collection (run against 2.6.9) demonstrating that the chunk is moved to a non-us shard and then almost immediately back again.

Generated at Thu Feb 08 03:47:25 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.