[SERVER-27724] Explore whether we can further minimize chunk metadata reloads on shards Created: 17/Jan/17 Updated: 31/Jan/18 Resolved: 18/Jan/18 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | None |
| Fix Version/s: | 3.6.3, 3.7.2 |
| Type: | Task | Priority: | Major - P3 |
| Reporter: | Dianna Hohensee (Inactive) | Assignee: | Dianna Hohensee (Inactive) |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||
| Backport Requested: |
v3.6
|
||||||||||||
| Sprint: | Sharding 2017-01-02, Sharding 2017-02-13, Sharding 2017-04-17, Sharding 2017-05-08, Sharding 2017-12-04, Storage 2018-01-01, Sharding 2017-12-18, Storage 2018-01-15, Storage 2018-01-29 | ||||||||||||
| Participants: | |||||||||||||
| Description |
|
Since we're persisting chunk metadata to a shard collection, it would be good to minimize chunk metadata reloads. The code has been around for a long time, not particularly modified, and unexamined. We may have made higher level changes that make full reloads unnecessary now. |
| Comments |
| Comment by Dianna Hohensee (Inactive) [ 18/Jan/18 ] |
|
Marking this as complete with the moveChunk and splitChunk improvements, but without improvements to mergeChunk. The CR requires more work, and it's not worth it to us to do spend further time making the improvements for little profit. Particularly because we intend to make chunk operation changes that must be considered in this mergeChunk refactor, but are weakly defined at this time. |
| Comment by Githook User [ 08/Dec/17 ] |
|
Author: {'name': 'Dianna Hohensee', 'email': 'dianna.hohensee@10gen.com', 'username': 'DiannaHohensee'}Message: (cherry picked from commit c9a30dfc4dcf383131ae059b154968a302fbe17c) |
| Comment by Githook User [ 28/Nov/17 ] |
|
Author: {'name': 'Dianna Hohensee', 'username': 'DiannaHohensee', 'email': 'dianna.hohensee@10gen.com'}Message: |
| Comment by Dianna Hohensee (Inactive) [ 14/Nov/17 ] |
|
Ideally, steps (6-9) of the shard split command move to the mongos split command, and (1) goes to the config split commit (already have |
| Comment by Dianna Hohensee (Inactive) [ 14/Nov/17 ] |
|
So splitChunk on the shard will 6) sending the command failed, return error 9) top chunk optimization, goes and grabs a range to return to the user as a moving suggestion if global min/max keys are in the chunk that was split. The post commit refresh (5) can be eliminated for successful commit split commands. The shard's moveChunk command refreshes prior to starting/checking command parameters. It seems like it'd also be fine to move (3) checks to the config commit split command, and skip the shard command start refresh (2), because the refresh is a round trip to the config server, so might as well make it the commit that does the roundtrip. The (mongos) cluster split command forces a refresh at start, and then ensures that the routing table it used for a successful split command to the shard is invalidated. The invalidation at the end seems superfluous, as the mongos won't misroute because of it. |
| Comment by Kaloian Manassiev [ 10/Nov/17 ] |
|
With the recent customer cases that we have seen, it appears that the auto-split behaviour causes stalls due to metadata refresh. Because of this, I also find it prudent to try and minimize these. What you propose seems reasonable. Perhaps we can just update the metadata on the config server, but don't perform refresh of the local filtering/routing metadata. |
| Comment by Dianna Hohensee (Inactive) [ 28/Jun/17 ] |
|
Maybe instead of trying to reduce refreshes, we just make the code that invalidates the secondary catalog cache smarter: only invalidate if there's a major version number or epoch change. |
| Comment by Dianna Hohensee (Inactive) [ 16/Jun/17 ] |
|
Just realized that the title and description talked about 'full' reloads. Now I get the question. Removed that bit, as I want to minimize the incremental loads. |
| Comment by Dianna Hohensee (Inactive) [ 26/May/17 ] |
|
CatalogCache changes didn't change chunk operation refresh behavior. We still do refresh on the shard twice, before and after any split/merge/move operations. I believe it was always incremental, if possible. |
| Comment by Kaloian Manassiev [ 26/May/17 ] |
|
Is this still a problem with the changes to the catalog cache, which always favour incremental refresh? |
| Comment by Dianna Hohensee (Inactive) [ 23/Mar/17 ] |
|
split and merge cause refresh, which could be eliminated and replaced with additional moveChunk command logic – such as refresh at start of moveChunk, or chunk commit logic. In general, limit when we must refresh. Arguably only need to refresh when shard is involved in a migration, donating or receiving data. |