[SERVER-60715] Access of array fields is slow in SBE queries Created: 14/Oct/21 Updated: 27/Oct/23 Resolved: 26/Sep/22 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Irina Yatsenko (Inactive) | Assignee: | Mihai Andrei |
| Resolution: | Gone away | Votes: | 0 |
| Labels: | pm2697-m3, sbe | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||||||||||
| Issue Links: |
|
||||||||||||||||
| Operating System: | ALL | ||||||||||||||||
| Sprint: | QE 2022-10-03 | ||||||||||||||||
| Participants: | |||||||||||||||||
| Story Points: | 0 | ||||||||||||||||
| Description |
|
Summary: accessing an array field is ~15-30% slower in SBE compared to the classical engine.
Create a collection that contains 10^6 documents like: { scalar: Random.randInt(10), array: [Random.randInt(10), Random.randInt(10), Random.randInt(10)], }Let the collection have namespace of "sbe-perf.LS". From mongo shell run against the collection two benchmarks:
results:
results: The second benchRun is twice as slow. While we do expect the array access to be slower, the drop in SBE is much bigger than in the classic engine. For the classic engine the numbers are respectively: "queryLatencyAverageMicros" : 419583.25, and "queryLatencyAverageMicros" : 657760,
The top 10 consumers of CPU in SBE Scalar Flame graphs are attached.
|
| Comments |
| Comment by David Storch [ 26/Sep/22 ] |
|
mihai.andrei@mongodb.com do we have a benchmark like this running in Evergreen anywhere? We have Queries.WildcardIndex.PointQueryOnSingleArrayField* which tests array traversal, but I'm not sure that we have a simple test with a sizeable data set that intends to emphasize array traversal. Are you aware of any pre-existing test like this? If not, I'm not sure which line item in the design document would cover the addition of this new benchmark. |
| Comment by Kyle Suarez [ 26/Sep/22 ] |
|
Nice work martin.neupauer@mongodb.com; I'm gonna mark this as being fixed by |
| Comment by Kyle Suarez [ 21/Sep/22 ] |
|
Assigning to mihai.andrei@mongodb.com to confirm we've made progress here. If there's still a regression – at the risk of building up the "hype train" – please try applying Martin's patch? |
| Comment by Irina Yatsenko (Inactive) [ 07/Jan/22 ] |
|
The plan for a single field point query (whether the field is an array or not): ).explain().queryPlanner.winningPlan.slotBasedPlan.stages [1] traverse s8 s7 s6 [s4, s5] {s8 || s7}{s8} from |
| Comment by Irina Yatsenko (Inactive) [ 19/Nov/21 ] |
|
Collected hardware perf counters for accessing array fields in a dataset with 10^6 records, where each record has the following schema: {a0: int, a1: [int], a4:[int, int, int, int], a8:[8 ints], a16:[16 ints]}. The integers are randomly generated in the range [0;10) with a query like find({x:17}), where "x" is one of the aN field. By the choice of values, the query returns no results and has to travers all arrays in an aN field fully. No indexes used. 611.75 msec task-clock # 0.186 CPUs utilized 708.39 msec task-clock # 0.230 CPUs utilized 1,234.99 msec task-clock # 0.342 CPUs utilized 1,894.53 msec task-clock # 0.409 CPUs utilized 4,076.74 msec task-clock # 0.586 CPUs utilized The stats show a linear dependency of the number of instructions on the size of the array at ~1450 instructions per element. Measuring the same queries in classic engine yields similar IPC values for all array numbers but much lower overall instruction counts (for 16-elements array seeing ~9,296M instructions overall compared to 26,444M in SBE). The dependency of the number of instructions on the size of array in classical engine is also linear but at ~320 instructions per element. Given the flamegraphs, I believe most of the overhead in the SBE engine is due to the VM. |
| Comment by Kyle Suarez [ 22/Oct/21 ] |
|
ethan.zhang, eric.cox and irina.yatsenko, we are sending these SBE performance issues to the $group epic. Let us know if you think it belongs in a separate project. |