[SERVER-55338] Propagate bulk inserts from Collection layer down to IndexCatalog Created: 19/Mar/21  Updated: 23/Jan/24

Status: Backlog
Project: Core Server
Component/s: Index Maintenance, Write Ops
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Mathias Stearn Assignee: Backlog - Storage Execution Team
Resolution: Unresolved Votes: 0
Labels: storex-perf, storex-shortlist
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to SERVER-55341 WiredTigerRecordStore should reserve ... Closed
is related to SERVER-55337 Use cursors for index writes at the S... Backlog
Assigned Teams:
Storage Execution
Participants:

 Description   

Right not, the Collection layer gets a batch of inserts from the write ups, and propagates the batch down to the RecordStore layer, but then passes the IndexCatalog one record at a time. This has at least three downsides:

  1. It goes through all indexes on one document before going to the next. It is likely to be more CPU (cache/branch predictor/etc) friendly to go through all documents in the first index, then do the same for the second index, especially if they are different kinds of indexes and use different code paths.
  2. It reduces the chances to avoid duplicating work using write cursors if we do SERVER-55337. This may be most evident for wildcard, where you really want to insert all of the keys for all documents in a given path, before moving to the next path. Also any case where multiple documents in the batch generate the same index key, they will be inserting right next to each other which should be really fast.
  3. It prevents deduping the multikeyMetadataKeys that are common between documents in a batch. There is a pretty high likelihood of there being many common keys, if not all documents generating an identical set of keys.

Generated at Thu Feb 08 05:36:11 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.