Loading...

XML

Word

Printable

JSON

Type: Task
Resolution: Fixed
Priority: Major - P3
Fix Version/s: 3.6.0-rc0
Affects Version/s: None
Component/s: Querying, Sharding
Labels:
None

Backwards Compatibility:
Fully Compatible
Sprint:
Repl 2017-07-31, Repl 2017-08-21, Repl 2017-09-11, Repl 2017-10-02
Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

When a change stream is initiated against a sharded cluster, merging the results can be tricky. A change stream will establish a cursor against each shard, and merge the results from those shards in order of 'clusterTime'. If each cursor returns at least one result, the _id will contain the cluster time, and can be used to merge the results and return the smallest one. However, if at least one shard returns an empty batch, we cannot yet return anything to the client, for fear that it will later return something at a smaller clusterTime. To avoid this issue, we should do the following:

To both the aggregation response, and the response of each getMore, add a new internal field called something like _internalLatestClusterTimeObserved to the cursor section of the response. This will report the highest cluster time that the cursor has observed, even if it didn't match the criteria and thus was not returned.
This value can be treated as a guarantee from the shard that it will never return data before that cluster time on that cursor.
The logic merging cursors on mongos can then return any results that have a cluster time smaller than the minimum reported by all cursors.
The merging logic can then safely run a getMore using a 'read after cluster time' with it's desired time to advance a cursors guaranteed-minimum-optime. However, it still needs to inspect the results of this response, since it's possible the cursor will still return a smaller optime if it exceeded its maxTimeMS.

In order to report this information back on each response, we will modify the capped insert notifier to keep track of the most recent optime observed. One proposal is to pass a callback to the notifier for it to use to decide when to wake the waiting thread. If we take this approach, the callback can save a reference to the OperationContext, and continually advance some sort of boost::optional<Timestamp> decoration on the OperationContext. The aggregate and getMore commands can then consult whether this is set to decide whether/what to include in this new response field.

Please forgive me if I've used poor terminology, I'm pretty new to this whole 'cluster time' thing.

is depended on by

SERVER-29141 Add sharding support for merging changeNotification responses from all shards

Closed

Assignee:: Matthew Russotto
Reporter:: Charlie Swanson
Participants:: Andy Schwerin, Charlie Swanson, Githook User, Matthew Russotto, Misha Tyulenev, Spencer Brody
Votes:: 0 Vote for this issue
Watchers:: 8 Start watching this issue

Created:: Jun 29 2017 09:37:21 PM UTC
Updated:: Oct 30 2023 11:15:34 PM UTC
Resolved:: Sep 19 2017 02:31:12 PM UTC
Confidence Status Last Update:: 10/Jul/17 6:38 PM

Details

Description

Attachments

Issue Links

Forms

Activity

People

Dates