[SERVER-74722] Investigate performance for initial bulk loading in sharding Created: 09/Mar/23  Updated: 30/May/23  Resolved: 26/May/23

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: Randolph Tan Assignee: [DO NOT USE] Backlog - Sharding NYC
Resolution: Won't Do Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
is related to SERVER-77570 Do index building after cloning in re... Closed
is related to SERVER-75647 POC: Index build at end for resharding Closed
Assigned Teams:
Sharding NYC
Participants:

 Description   

Currently in migration and resharding, we create the indexes first and then copy the documents. The indexes are then updated just like regular writes. Replication initial sync uses CollectionBulkLoaderImpl which defers index updates and then commits them manually with MultiIndexBlock at the end. This ticket is to investigate whether using the same strategy would net some performance improvements.



 Comments   
Comment by Max Hirschhorn [ 26/May/23 ]

The PM-2322 project will be changing resharding to build the indexes after finishing the initial clone of data. We'll see performance improvements through there.

We discussed this ticket in triage and it doesn't feel worthwhile to make any changes to chunk migration. This optimization would only help the first chunk for a sharded collection which migrates to a shard and WT cache effect benefits for 128MB are going to be significantly less.

Generated at Thu Feb 08 06:28:21 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.