-
Type: Improvement
-
Resolution: Fixed
-
Priority: Major - P3
-
Affects Version/s: None
-
Component/s: Aggregation Framework, Sharding
-
Labels:None
-
Fully Compatible
-
Query 2018-11-19, Query 2018-12-03
There are currently two stages which need to look up/determine the document key fields for a collection (shard key plus _id or simply the _id if it's not sharded): $out and $changeStream. On a shard server, there are two ways we can gather that information: through the CollectionShardingState or through the CatalogCache. The important distinction between these two classes is that CollectionShardingState is meant to be used for a collection which is hosted on the shard itself, whereas CatalogCache is meant to be used when a process is acting as a router. CatalogCache was originally developed and used for mongos. For change streams, we want to use the CollectionShardingState because we know the collection is hosted on this shard (given that we're transforming an oplog entry for a write to that collection). For $out, we want to use the CatalogCache because we may be performing remote writes and thus acting as a router. This distinction is not currently present and in fact we use both in a very confusing way.
We should separate the current MongoProcessInterface::collectDocumentKeyFields into two methods, one for each purpose.