[SERVER-47810] Resume token returned by mongoS can be earlier than user-specified resume point Created: 27/Apr/20 Updated: 29/Oct/23 Resolved: 22/May/20 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Aggregation Framework |
| Affects Version/s: | None |
| Fix Version/s: | 4.4.0-rc8, 4.7.0 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Bernard Gorman | Assignee: | Bernard Gorman |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | qexec-team | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||
| Backwards Compatibility: | Fully Compatible | ||||
| Operating System: | ALL | ||||
| Backport Requested: |
v4.4
|
||||
| Sprint: | Query 2020-05-04, Query 2020-05-18, Query 2020-06-01 | ||||
| Participants: | |||||
| Description |
|
In cases where a resume token or starting time is specified when opening a change stream, the postBatchResumeToken returned with the first batch must always be at least equal to the specified resume point, even if the batch itself is empty. However, if the user opens a change stream on mongoS with a startAtOperationTime at a point in the future (which is perfectly legal), then the stream will return high-water-mark PBRTs that reflect the current clusterTime rather than waiting until the clusterTime exceeds startAtOperationTime. |
| Comments |
| Comment by Githook User [ 29/May/20 ] |
|
Author: {'name': 'Bernard Gorman', 'email': 'bernard.gorman@gmail.com', 'username': 'gormanb'}Message: (cherry picked from commit 35756d5b0fe1bc810de1d740950b2fa41e449bdd) |
| Comment by Githook User [ 22/May/20 ] |
|
Author: {'name': 'Bernard Gorman', 'email': 'bernard.gorman@gmail.com', 'username': 'gormanb'}Message: |
| Comment by Bernard Gorman [ 13/May/20 ] |
|
charlie.swanson: yep, this looks accurate to me. |
| Comment by Charlie Swanson [ 13/May/20 ] |
|
As part of working on this, we discovered another special case for the change stream's cursor on the config server. That cursor may occasionally return "addShard" events which the change stream uses to keep the stream open on all shards. The event is swallowed internally and not returned to the user. Such an event should be prevented from becoming the high water mark. We decided this because: 1) The resume token for this event is a "real" token, meaning it's not a manufactured high water mark token and we can expect to find it in an oplog. Our logic for resuming the stream will expect to see the event again to make sure we can resume. 2) Given the current order of checking for resume and checking for addShard, the resume token check would never see the event and would fail. 3) If we instead flipped that order, then in order to successfully resume you need to be sure to read that event from the config servers. This would cause problems because (a) The window of history on the config server may be small and ideally shouldn't be a factor in whether you can successfully resume a stream. (b) The cursor we open on the config servers is usually opened at a "recent" clusterTime, ignoring the resume token. The stream is only there to detect new shards, so otherwise doesn't need to go back and read old history. Determining the correct time to open the cursor on the config servers is already difficult to get right; we don't want to complicate it further.
bernard.gorman does the above accurately reflect our conversation? |