Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-42290

Target change streams to the primary shard when possible

    • Query Optimization
    • Query 2021-01-25, Query Optimization 2021-05-03

      When creating a change stream on an unsharded collection, we currently open cursors on all shards rather than just the primary shard. We do so because, at the point where we are deciding upon a routing strategy, we have no insight into whether the operation is creating a new stream or resuming from a point in the past; in order to cover all possible scenarios, we must therefore assume the latter. This in turn means that we may be resuming from a time when the namespace existed in an earlier incarnation as a sharded collection which has since been dropped, and because we do not have access to a historical view of the routing table, we must again assume that this is the case. This defensive series of assumptions lets us open streams which work across all potential use-cases, at the cost of significant inefficiency and some unfortunate edge-cases (e.g. SERVER-42232) in the case of unsharded collections.

      But by far the more likely scenario is that we are starting or resuming a stream on an unsharded collection which does currently exist, in which case opening streams on every shard is wasteful and opens the possibility of complications such as SERVER-42232. We already resolve the UUID of unsharded collections as part of the aggregation processing code, so by providing a means for the routing logic to examine the stream's resume information, it should be possible to target the primary shard in cases where we are opening a new stream or resuming a stream on an existing unsharded collection.

            backlog-query-optimization Backlog - Query Optimization
            bernard.gorman@mongodb.com Bernard Gorman
            2 Vote for this issue
            25 Start watching this issue