Loading...

XML

Word

Printable

JSON

Type: Improvement
Resolution: Unresolved
Priority: Major - P3
Fix Version/s: None
Affects Version/s: None
Component/s: Query Execution
Labels:
None

Assigned Teams:

Query Execution
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

~~SERVER-70395~~ recently improved the performance of spilling in SBE's HashAggStage by adopting the following algorithm. When the hash table exceeds its memory budget, the entire contents of the hash table are flushed to a TemporaryRecordStore and the hash table itself is cleared. This may happen many times as the input data is consumed. Importantly, the TemporaryRecordStore is sorted by the group-key, implemented by encoding the MaterializedRow for the key into the record store's RecordId. This means that once the data is consumed, there will be sequences of equal keys that are adjacent in the record store; a monotonically increasing counter is used to ensure that the {{RecordId}}s are unique. The partial aggregates can be merged to produce the final output using a single forwards pass over the spill table.

While spilling to a table sorted by group-by key leads to some nice simplicity in the implementation, anna.wawrzyniak@mongodb.com pointed out that it could result in bad IO access patterns. In particular, each time we spill new data from the hash table to the TemporaryRecordStore, we may need to write data to every page of the spill table.

As an alternative, we could look into always appending the newly spilled data (sorted by key) to the end of the TemporaryRecordStore. This would be similar to how spilling in DocumentSourceGroup works – it appends a new sorted segment to a spill file every time a spill event occurs. The benefit is that when we spill, we don't have to write new data to the pages that were written during a previous spill. When merging the partial aggregates, we would need to do a merge-sort of the spilled segments much like DocumentSourceGroup does. Another consideration is that if there are too many spilled segments, we could have a merge tree with depth greater than 1 to avoid having to merge too many segments at once.

is related to

SERVER-70395 Slot-Based Engine too aggressively uses disk for $group and is slow

Closed

Assignee:: [DO NOT USE] Backlog - Query Execution
Reporter:: David Storch
Participants:: [DO NOT USE] Backlog - Query Execution, David Storch
Votes:: 0 Vote for this issue
Watchers:: 5 Start watching this issue

Created:: Feb 22 2023 08:23:15 PM UTC
Updated:: Mar 14 2023 05:36:46 PM UTC

Details

Description

Attachments

Issue Links

Activity

People

Dates