|
One thing to consider here: If we use a non-zero batchSize for establishing the cursors on the shards, then there's a possibility it will interact poorly with the shard versioning protocol and end up wasting more work. There is a benefit to doing as little as possible before verifying that all shards are on the same page with respect to shard versioning.
For example, if you had 10 shards and some active migrations, 8 shards might complete and get the first batch but 2 shards may have recently completed a migration so respond with an error. This means the whole aggregation needs to be retried, throwing out any work that those other 8 shards have done. We also have no way of interrupting the work on those other shards right now that I know of. We don't have a cursor ID yet, and no op ID that we can guess. So in the worst case these 8 shards will compute an hours-long grouping query to return the first batch only to then realize no one is interested in that response anymore.
That all said, for simple streaming queries this is likely a pretty good idea, especially if there's a limit. My understanding is that this is something we do for normal queries (find commands), just not for aggregates.
|