[SERVER-55430] Record metrics about whether a collection is rebalanced after resharding op finishes Created: 22/Mar/21  Updated: 29/Oct/23  Resolved: 23/Jun/21

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: 5.0.0-rc3, 5.1.0-rc0

Type: Task Priority: Major - P3
Reporter: Janna Golden Assignee: Randolph Tan
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Documented
is documented by DOCS-14565 Investigate changes in SERVER-55430: ... Closed
Related
related to SERVER-76890 Make resharding's lastOpEndingChunkIm... Backlog
Backwards Compatibility: Fully Compatible
Backport Requested:
v5.0
Sprint: Sharding 2021-06-14, Sharding 2021-06-28
Participants:
Story Points: 2

 Description   

Resharding attempts to create an initial chunk distribution such that the collection will not be rebalanced by the balancer immediately after (absent of any topology changes, etc of course). It would be useful to collect metrics on whether the balancer rebalances a collection that has just been resharded in order to determine whether resharding's initial split policy is in fact creating a good initial distribution.



 Comments   
Comment by Vivian Ge (Inactive) [ 06/Oct/21 ]

Updating the fixversion since branching activities occurred yesterday. This ticket will be in rc0 when it’s been triggered. For more active release information, please keep an eye on #server-release. Thank you!

Comment by Githook User [ 16/Jun/21 ]

Author:

{'name': 'Randolph Tan', 'email': 'randolph@10gen.com', 'username': 'renctan'}

Message: SERVER-55430 Record metrics about whether a collection is rebalanced after resharding op finishes

(cherry picked from commit 5c00024e3cf4a27039117e000e475c6ee797c700)
Branch: v5.0
https://github.com/mongodb/mongo/commit/dfdbeaa9f6bc166c69b9a4de9a538c08952b8739

Comment by Githook User [ 16/Jun/21 ]

Author:

{'name': 'Randolph Tan', 'email': 'randolph@10gen.com', 'username': 'renctan'}

Message: SERVER-55430 Record metrics about whether a collection is rebalanced after resharding op finishes
Branch: master
https://github.com/mongodb/mongo/commit/5c00024e3cf4a27039117e000e475c6ee797c700

Comment by Janna Golden [ 03/Jun/21 ]

Hmm, renctan that's a good point about it being difficult to know what collections have been resharded recently. I think max.hirschhorn's idea about recording it in the ReshardingMetrics makes sense, we'd essentially just want to check that the initial split alg is putting this "ideal number of chunks per shard. I think it might even make sense to do it when we create the chunks because we have access to the shard info (what zones are associated with what shards) there as well, though I don't know if we have access to the ReshardingMetrics at that point.

Comment by Max Hirschhorn [ 02/Jun/21 ]

renctan, I think a per-collection metric could make it difficult to get reporting on because per-collection metrics aren't something we could add to a serverStatus section tracked by FTDC since would lead to excessive schema changes (e.g. as what happened with the range deleter and SERVER-47641). My take on SERVER-55430 would be to have some part of ReshardingCoordinator call the function for 'is this collection balanced?' that the Balancer thread would normally call and record the yes/no answer in the ReshardingMetrics before completing the operation. One step further could be to record how unbalanced the collection is. I would want to aim for a single number summarization (maybe [max nChunks - min nChunks]?) to avoid inducing schema changes from a per-shard metric either.

Comment by Randolph Tan [ 02/Jun/21 ]

max.hirschhorn, janna.golden. Here are my proposal for this ticket, what do you guys think?

  1. Since it would be hard for the balancer to distinguish which collection has just been resharded, what if we just have stats for all sharded collections?
  2. The most basic stat I can think of is standard deviation of chunks per tag/zone per collection. I'm currently thinking of putting a new field in config.collections to contain this stats, and collStats would display this info.
  3. The question now becomes, when do we update the stats? The easiest would be at the beginning of the balancer round, because that is when the balancer collects stats about chunk distribution and it doesn't update it after a migration. This would achieve the goal of exposing chunk imbalance caused by resharding, but it can also be a weird stat to show, since it is somewhat "delayed" and does not represent the most up to date distribution as the balancer round progresses.
Generated at Thu Feb 08 05:36:27 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.