[SERVER-68819] Investigate why performance on BestBuy queries is substantially lower than with the original POC Created: 14/Aug/22 Updated: 14/Nov/22 Resolved: 14/Nov/22 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Task | Priority: | Major - P3 |
| Reporter: | Pawel Terlecki | Assignee: | Ian Boros |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Sprint: | QE 2022-09-19, QE 2022-10-03, QE 2022-10-17, QE 2022-10-31, QE 2022-11-14, QE 2022-11-28 |
| Participants: |
| Description |
|
Original results from Mathias vs Current results |
| Comments |
| Comment by Ian Boros [ 14/Nov/22 ] | ||||||
|
Closing as "done" since we have a good understanding about the difference between the poc and SBE. | ||||||
| Comment by Ian Boros [ 18/Aug/22 ] | ||||||
|
I've also looked at the explains and run the queries through a profiler, and found the following:
This query does not use the filter pushdown yet. It materializes a document, then runs the frankenMatcher on that document (which actually requires serializing it to BSON), and then does the $group. The most obvious way to improve the performance of this query is to push the $exists filter into the column. There is also a fair amount of time spent allocating and freeing values, which would be reduced if we avoid the skip materialization. There is a lot of low hanging fruit for improving the SBE implementation.
In this query the filter is pushed down to the scan layer. There's not as much low hanging fruit in the SBE implementation here. From what I see in the profiles, a lot of the time is spent in the VM evaluating the predicate (which we likely can optimize), and a lot of time is spent in wiredtiger. I'm also seeing some time spent copying and freeing data just in to evaluate the predicate, which we should be able to avoid. The difference between the POC and SBE is also much narrower here. | ||||||
| Comment by Ian Boros [ 16/Aug/22 ] | ||||||
|
The results labeled "current" actually don't use the zig zag scan at all, since that was only merged on Friday. So I'll have to re-run them. Also, Mathias's POC did a similar optimization to what we did with SBE $group pushdown since he added a special DocumentSourceColumnStore which reads directly from storage, skipping the find() layer. A good amount of the perf improvement for the warm case likely came from that. For this reason, I want to compare the latencies of the POC and master directly, instead of comparing the performance difference of collscan vs columns between the POC and master. | ||||||
| Comment by Pawel Terlecki [ 16/Aug/22 ] | ||||||
|
I see that for $group on a single column and cold data perf is similar, 25x. So maybe other results are worse because zigzag filtering needs some work. The hot data perf is worth understanding too. Potentially less interesting for the EAP. | ||||||
| Comment by Pawel Terlecki [ 15/Aug/22 ] | ||||||
|
I am thinking just the I/O for cold data first. Say we just retrieve a single scalar field / project : {a:1}. I would expect this to have similar perf in both implementations. | ||||||
| Comment by Ian Boros [ 15/Aug/22 ] | ||||||
|
A few notes for myself: | ||||||
| Comment by Pawel Terlecki [ 14/Aug/22 ] | ||||||
|
cc: ian.boros@mongodb.com charlie.swanson@mongodb.com colby.ing@mongodb.com |