[SERVER-82566] Try to avoid call to cluster_aggregation_planner::getCollationAndUUID Created: 30/Oct/23 Updated: 06/Nov/23 Resolved: 06/Nov/23 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Task | Priority: | Major - P3 |
| Reporter: | Will Buerger | Assignee: | Backlog - Query Integration |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||||||||||
| Issue Links: |
|
||||||||||||||||
| Assigned Teams: |
Query Integration
|
||||||||||||||||
| Participants: | |||||||||||||||||
| Description |
|
In a sharded environment, the throughput of an aggregation {$match: {_id: x}} is slower than the throughput of the equivalent find command by a factor of about 25X. I've attached flamegraphs to compare those 2 workloads. In the agg case, 91% of the time is spent in cluster_aggregation_planner::getCollationAndUUID since it has to execute a remote call to the primary shard to retrieve that metadata. We believe that, at least in cases when we're only parsing the pipeline for the sake of query stats, the uuid is optional and the collation can be a default empty object. We should try to avoid that call so the cluster_aggregate IDHack isn't bottlenecked there. |
| Comments |
| Comment by Will Buerger [ 06/Nov/23 ] |
|
This was resolved by |