-
Type: Task
-
Resolution: Unresolved
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
None
-
Query Execution
When the dedup flag is set, IndexScan tracks the ids of the records that have been already seen. Currently there is no limit on the amount of memory that can be used to store the seen recordIds.
To set the maximum allowed memory, a new query knob internalIndexScanMaxMemoryBytes will be added in query_knobs.idl.
The stage will spill to disk when the seen data structure exceeds the maximum memory allowed.
The spilling should be implemented in a method
void spill(unit64_t maximumMemoryUsage)
that will spill until the memory used by the stage is at most maximumMemoryUsage. The method should track the following metrics
- bool usedDisk : Set to true when the stage has spilled.
- uint64_t spills : The number of times the stage spilled.
- uint64_t spilledBytes : The size, in bytes, of the memory released with spilling.
- uint64_t spilledDataStorageSize : The size, in bytes, of disk space used for spilling.
To track those metrics, we should update the IndexScanStats struct. The metrics should be reported in serverStatus and in explain execution stats.
Before spilling, the stage should make sure that there is enough disk space for spilling. This can be done using ensureSufficientDiskSpaceForSpilling and uassertStatusOK.
A second method, to retrieve the spilled data, should be added to allow IndexScan to execute reading data from disk. The method should make sure to keep the memory usage below the threshold at any moment.
The stage should release all memory and disk when it is closed.
- has to be done after
-
SERVER-88337 Keep just one deduplicator method in IndexScan
- In Progress
- is depended on by
-
SERVER-24375 Deduping in OR, SORT_MERGE, and IXSCAN (multikey case) uses unbounded memory
- Backlog