[SERVER-32349] Resuming a sharded change stream when there are multiple changes with the same timestamp may be impossible Created: 14/Dec/17  Updated: 30/Oct/23  Resolved: 22/Jan/18

Status: Closed
Project: Core Server
Component/s: Aggregation Framework
Affects Version/s: 3.6.0
Fix Version/s: 3.6.3, 3.7.2

Type: Bug Priority: Major - P3
Reporter: Charlie Swanson Assignee: Martin Neupauer
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v3.6
Sprint: Query 2018-01-15, Query 2018-01-29
Participants:

 Description   

When resuming a change stream, we first need to make sure that the oplog has enough history to allow a resume. To do this, we make sure that the first thing in the stream is the resume token we expect. This works correctly in an unsharded environment, because we will start the oplog query with a ts: {$gte: <resume ts>} query, and each change's timestamp will be unique, so the first thing that comes back should be the change we're looking to resume after (if it's still possible to resume).

Things are a bit different in a sharded scenario. To start, we don't know which shard is going to have the resume token, so we need to perform the check against the first document after merging the results from each shard. Secondly, the timestamp of each change is not guaranteed to be unique in a sharded cluster, two changes can happen 'simultaneously' (with the same timestamp) on two different shards. In situations like this, it's possible and legal that the first thing in the change stream will not be the resume token we're looking for, but rather a change that preceded the change we're resuming after and happened to have the same timestamp.

To account for this, the logic that checks if we are able to resume (inside DocumentSourceEnsureResumeTokenPresent::getNext()) should allow arbitrarily many changes to occur before the resume token iff they have the same timestamp as the resume token. As soon as we see the resume token itself we know it's possible to resume, or if we see a change with a higher timestamp we know it's not possible to resume.



 Comments   
Comment by Githook User [ 12/Feb/18 ]

Author:

{'email': 'martin.neupauer@10gen.com', 'name': 'Martin Neupauer', 'username': 'MartinNeupauer'}

Message: SERVER-32349 Change streams over sharded collections may produce merged op log entries
with the same timestamps if the operations are coming from multiple shards. When we
resume the change stream we have to position to the right place - the position is determined
both by the timestamp and the document id. Previously we checked the timestamp only,
now we loop over the equal timestamps and find the right document.

(cherry picked from commit 194ec4857fa0db8085da88e22eaae96687902d66)
Branch: v3.6
https://github.com/mongodb/mongo/commit/e74bf273ef83cfbee332fe33079c0f640c71f7bb

Comment by Githook User [ 19/Jan/18 ]

Author:

{'name': 'Martin Neupauer', 'email': 'martin.neupauer@10gen.com', 'username': 'MartinNeupauer'}

Message: SERVER-32349 Change streams over sharded collections may produce merged op log entries
with the same timestamps if the operations are coming from multiple shards. When we
resume the change stream we have to position to the right place - the position is determined
both by the timestamp and the document id. Previously we checked the timestamp only,
now we loop over the equal timestamps and find the right document.
Branch: master
https://github.com/mongodb/mongo/commit/194ec4857fa0db8085da88e22eaae96687902d66

Comment by Charlie Swanson [ 05/Jan/18 ]

As far as I know, the fix described in the description should work - most of the work would probably be in creating a test for this, though even that shouldn't be crazy difficult. I would estimate a couple days, maybe 2?

Comment by David Storch [ 05/Jan/18 ]

charlie.swanson can you provide a back of the envelope estimate on how many days of developer time you think this will require?

Generated at Thu Feb 08 04:29:58 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.