[SERVER-78764] Revisit sampling CE scan-from-start in performance variants Created: 07/Jul/23 Updated: 19/Jan/24 Resolved: 19/Jan/24 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Task | Priority: | Major - P3 |
| Reporter: | Alya Berciu | Assignee: | Backlog - Query Optimization |
| Resolution: | Won't Fix | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||
| Assigned Teams: |
Query Optimization
|
||||||||||||
| Participants: | |||||||||||||
| Description |
|
In order to reduce noise in sampling variants, we opted to scan from the start of the collection (rather than taking a true random sample) in our performance tests. We should revisit this approach. |
| Comments |
| Comment by David Percy [ 19/Jan/24 ] |
|
It sounds like when we set up these tests, we had decided to use sequential sampling because:
We considered doing random sampling with a fixed seed, but it uses a WiredTiger random cursor, which doesn't support repeatability / fixed seed: WT-11815. (And repeatability requires more than a fixed seed, because the result also depends on the tree shape for example.) |