[SERVER-62099] POC balancer must take decisions based on collection sizes on shards Created: 16/Dec/21  Updated: 20/Jan/22  Resolved: 20/Jan/22

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: Pierlauro Sciarelli Assignee: Pierlauro Sciarelli
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Sprint: Sharding EMEA 2021-12-27, Sharding EMEA 2022-01-10, Sharding EMEA 2022-01-24
Participants:

 Description   
  • Make the balancer call dataSize on shards to understand which are the most/least full.
  • Implement “dummy” donateData command (calling into move chunk) to move data between shards according to the their filling rate.


 Comments   
Comment by Pierlauro Sciarelli [ 20/Jan/22 ]

Outcome of the [inefficient & dirty-coded] POC:

  • It was possible to modify the balancer in order to take into account the data size on shards rather than the number of chunks (with an additional broadcast to all shards at the beginning of each round).
  • Implemented donateData command received from shards with parameters (namespace, toShard, minBound) that - at moment - simply calls into moveChunk to move the chunk with minBound.
  • The moveChunk call triggered in balancing rounds has been replaced with a call to donateData

Relevant problems discovered while POC-ing:

  • Since dataSize includes orphans, in order for the balancer to work properly we need to track orphans somehow . For the POC, instead of calling dataSize the balancer is relying on countDocuments, but that is obviously not a viable solution because it means performing a whole index scan for every collection at every round.
Generated at Thu Feb 08 05:54:10 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.