[SERVER-81100] Investigate Slow Splitting When Using sh.ShardCollection() Created: 15/Sep/23  Updated: 10/Oct/23

Status: Open
Project: Core Server
Component/s: None
Affects Version/s: 5.0.21
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: Matt Panton Assignee: Matt Panton
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Assigned Teams:
Sharding EMEA
Sprint: Sharding EMEA 2023-10-16
Participants:

 Description   

During recent testing balancer enhancements I found that initially sharding a collection with a small chunk size will significantly increase the duration to complete sharding collection relative to sharding an empty collection and loading the data with the same chunk size.

Test Scenario - MongoDB 5.0
1:

  • 4TB Collection Size
  • 32KB Doc Size
  • 1MB Chunk Size
  • Shard Existing Unsharded Collection with sh.shardCollection

2:

  • Set 1MB Chunk Size
  • Shard Empty Collection with sh.shardCollection
  • Insert Documents
  • 32KB Doc Size
  • 1MB Chunk Size

Scenario 1 will take much longer than Scenario 2


Generated at Thu Feb 08 06:45:28 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.