-
Type: Task
-
Resolution: Unresolved
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
None
-
Query Execution
UniqueStage tracks the ids of the records that have been already seen and does not return the same RecordId twice. Currently there is no limit on the amount of memory that can be used to store the seen recordIds.
To set the maximum allows memory, a new query knob internalUniqueStageMaxMemoryBytes will be added in query_knobs.idl.
The stage will spill to disk when the seen data structure exceeds the maximum memory allowed.
The spilling should be implemented in a method
void spill(unit64_t maximumMemoryUsage)
that will spill until the memory used by the stage is at most maximumMemoryUsage. The method should track the following metrics
- bool usedDisk : Set to true when the stage has spilled.
- uint64_t spills : The number of times the stage spilled.
- uint64_t spilledBytes : The size, in bytes, of the memory released with spilling.
- uint64_t spilledDataStorageSize : The size, in bytes, of disk space used for spilling.
To track those metrics, we should update the UniqueStats struct. The metrics should be reported in serverStatus and in explain execution stats.
Before spilling, the stage should make sure that there is enough disk space for spilling. This can be done using ensureSufficientDiskSpaceForSpilling and uassertStatusOK.
A second method, to retrieve the spilled data, should be added to allow the UniqueStage to execute reading data from disk. The method should make sure to keep the memory usage below the threshold at any moment.
The stage should release all memory and disk when it is closed.