[SERVER-58181] changestream fullDocument lookup projections introduce overheads in degenerate cases Created: 30/Jun/21 Updated: 06/Dec/22 |
|
| Status: | Backlog |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Oren Ovadia | Assignee: | Backlog - Query Execution |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Assigned Teams: |
Query Execution
|
| Participants: |
| Description |
|
We are interested in projecting a subset of fields from the post-image document in a changestream. Testing showed (also see CLOUDP-91896) that performance only improves when the projection is smaller than some fraction of the total document size (as a function of network delay etc). When most fields in the document are projected, there could be a 30% slowdown. Is it possible to optimize changestream projection so that overheads for degenerate cases is minimized? Otherwise it is hard to determine whether or not Search can use this optimization for a certain collection. Note that in this case, the projection only applies to fields nested under `fullDocument.*`, so it is possible to push the projection down to the query system (by a query optimization or by exposing new APIs) |
| Comments |
| Comment by Bernard Gorman [ 16/Jul/21 ] |
|
I'm going to put this on the backlog for now. Even if we were to push the projection down so that we perform it during the lookup instead of immediately afterwards, the performance benefit would at best be negligible - we would be doing the same amount of work, just at location A instead of location B. If we were going to add this functionality into our optimizer, we would also want it to be a more general-purpose optimization, which would require the ability to break apart a $project based on dependency analysis, which we currently cannot do. |