[SERVER-50959] Avoid copying data from the Sorter into InMemIterator Created: 15/Sep/20  Updated: 06/Dec/22

Status: Backlog
Project: Core Server
Component/s: Storage
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Gregory Noma Assignee: Backlog - Storage Execution Team
Resolution: Unresolved Votes: 0
Labels: newgrad, pm-1344
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to SERVER-50221 Keep NoLimitSorter aware of its in-me... Closed
related to SERVER-50920 Resuming index builds from the bulk l... Closed
related to SERVER-49829 Implement spilling for Top K sort in SBE Closed
Assigned Teams:
Storage Execution
Participants:

 Description   

SERVER-49829 had already introduced this optimization, but it is being undone as a part of SERVER-50920 in order to fix an issue with resumable index builds. However, it should be possible to re-add this optimization with some additional steps taken for the resumable index build case. There are a few options:

  1. In the case that the index build did not need to spill to disk, have the index build's BulkBuilder keep track of the keys that it has already retrieved from the InMemIterator. Then, if it is interrupted for shutdown during bulk load, it can supply this (sorted) list of keys to Sorter to supplement the rest of the sorted keys that are still in the Sorter, which will be written to disk as is already done.
  2. Have index builds always spill to disk at the beginning of the bulk load phase. This has the downside of spilling to disk even when we otherwise do not need to.
  3. Use an iterator in the InMemIterator instead of popping from the front for each element. This has the downside of still requiring the data to be copied when returning it from the InMemIterator.

Resumable index builds will always need to copy the data at one point or another, but options 1 and 2 allow other users of the Sorter to not have to do these otherwise unnecessary copies.


Generated at Thu Feb 08 05:24:06 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.