[SERVER-74722] Investigate performance for initial bulk loading in sharding Created: 09/Mar/23 Updated: 30/May/23 Resolved: 26/May/23 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Task | Priority: | Major - P3 |
| Reporter: | Randolph Tan | Assignee: | [DO NOT USE] Backlog - Sharding NYC |
| Resolution: | Won't Do | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||
| Assigned Teams: |
Sharding NYC
|
||||||||||||
| Participants: | |||||||||||||
| Description |
|
Currently in migration and resharding, we create the indexes first and then copy the documents. The indexes are then updated just like regular writes. Replication initial sync uses CollectionBulkLoaderImpl which defers index updates and then commits them manually with MultiIndexBlock at the end. This ticket is to investigate whether using the same strategy would net some performance improvements. |
| Comments |
| Comment by Max Hirschhorn [ 26/May/23 ] |
|
The PM-2322 project will be changing resharding to build the indexes after finishing the initial clone of data. We'll see performance improvements through there. We discussed this ticket in triage and it doesn't feel worthwhile to make any changes to chunk migration. This optimization would only help the first chunk for a sharded collection which migrates to a shard and WT cache effect benefits for 128MB are going to be significantly less. |