Loading...

XML

Word

Printable

JSON

Type: Task
Resolution: Fixed
Priority: Major - P3
Fix Version/s: 6.3.0-rc0
Affects Version/s: None
Component/s: None
Labels:
None

Backwards Compatibility:
Fully Compatible
Sprint:
QO 2022-12-12, QO 2022-12-26, QO 2023-01-09, QO 2023-01-23
Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

Currently analyze command runs full collection scan and since keeps all the fields for the analyzed path in memory. This will exceed 100Mb limit for large collections. To allow analyze to run on all sizes sampling needs to be introduced.

There are several approaches to adding sampling to the analyze pipeline :

use $sample stage. This approach will work but will need memory for the in-memory sort if the sampling size is > 5% of collection size. Because current memory limit for the pipeline is 100Mb it will leave less memory for storing the values to build histograms on.
use { $match: { $expr: { $rt: [<sample ratio>, {$rand: {} } ] } } }, this approach will not use extra memory
implement a custom $sample stage that keeps track of used memory and therefor will not generate out of memory error.

The objective of this ticket is to experiment with 1,2,3 (or may be more) to find the best approach to implement sampling.

is depended on by

SERVER-72614 Implement sampling for historgram computation

Closed

is related to

SERVER-99631 histogramCE: sampleRate > 1 does not use $sample, so no perf improvement

Needs Scheduling

Assignee:: Ben Shteinfeld
Reporter:: Misha Tyulenev (Inactive)
Participants:: Ben Shteinfeld, Misha Tyulenev
Votes:: 0 Vote for this issue
Watchers:: 5 Start watching this issue

Created:: Dec 07 2022 09:58:50 PM UTC
Updated:: Jan 21 2025 02:30:26 PM UTC
Resolved:: Jan 09 2023 09:13:05 PM UTC
Confidence Status Last Update:: 12/Dec/22 3:32 PM

Details

Description

Attachments

Issue Links

Activity

People

Dates