-
Type: Task
-
Resolution: Won't Do
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
None
-
Query Optimization
The BigCollection benchmark in the mongo-perf runs multiple tests where the total data size of the collection is kept the same, while the number of documents increases and their size decreases by the same factor. Both Scan and Filter queries reveal a cliff in the throughput for the collection with the largest number of documents (1638400). This holds both for classic and SBE engines.
From the investigation in SERVER-80583 on VM, 1-thread throughput in ops-per-sec
Document number | Document size | Batch size |
Classic | SBE |
---|---|---|---|---|
25 | 16777216 | 0 | 2.286 | 2.152 |
400 | 1048576 | 0 | 2.781 | 2.505 |
6400 | 65536 | 0 | 2.788 | 2.583 |
102400 | 4096 | 0 | 2.248 | 2.145 |
1638400 | 256 | 0 | 0.702 | 0.799 |
400 | 1048576 | 1 | 2.937 | 2.823 |
6400 | 65536 | 16 | 2.846 | 2.825 |
102400 | 4096 | 256 | 2.358 | 2.432 |
1638400 | 256 | 4096 | 0.745 | 0.901 |
This seems to be partially due to the WiredTiger, and partially due to the predicate computation ( higher computational cost for larger number of documents). Excerpt from the flame graphs in the attachment for the SBE engine:
PlanExecutorSBE::getNext : 1.89% vs. 44.73%
FilterStage::getNext : 1.67% vs. 40.77%
WiredTigerRecordStoreCursorBase::next : 0.93% vs. 23.84%
sbe::vm::ByteCode::runPredicate : 0.24% vs 7.46%