-
Type: Improvement
-
Resolution: Unresolved
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
Labels:None
-
Cluster Scalability
At the time of writing, when targeting a $out, if the collection does not exist, we will create the (new) output collection on the primary shard. This is based on the idea that because the primary shard still coordinates DDL operations, doing this will make it so that all DDL operations are shard local (that is, they don't have to be issued over the network). A potential improvement would be to create the new collection on the same shard as the source collection (assuming that the collection is unsplittable or even untracked), as this would make all writes shard local. The idea here is that if our $out has to issue many inserts, we will save on the number of remote writes that we have to perform (at the expense of having to do DDL operations remotely, however, this would in theory be amortized by enough writes).
This ticket tracks the work to:
- Investigate in which cases we wish to apply this heuristic (i.e. for single collection aggregations this is pretty straightforward, but if we have a multi-collection aggregation, it's an open question as to which shard we should nominate, if any)
- Understand the performance characteristics of this changeĀ (i.e. in general, is it better than always outputting to the primary shard?)
- Implement this change if the performance is faster than the current behavior.
- is related to
-
SERVER-87526 $out consider placing destination collection on same shard as source.
- Backlog