[SERVER-35740] Report high-water-mark resume token with each (possibly empty) change stream batch Created: 22/Jun/18  Updated: 29/Oct/23  Resolved: 01/Mar/19

Status: Closed
Project: Core Server
Component/s: Aggregation Framework
Affects Version/s: None
Fix Version/s: 4.0.7, 4.1.9

Type: Improvement Priority: Major - P3
Reporter: Charlie Swanson Assignee: Bernard Gorman
Resolution: Fixed Votes: 2
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
depends on SERVER-38414 Upgrade/Downgrade testing for change ... Closed
is depended on by DRIVERS-595 Support postBatchResumeToken in chang... Closed
Duplicate
is duplicated by SERVER-32895 Provide method for getting 'current r... Closed
is duplicated by SERVER-39143 Add noop-event to Change stream Closed
Related
Backwards Compatibility: Minor Change
Sprint: Query 2019-01-14, Query 2019-02-11, Query 2019-02-25, Query 2019-03-11
Participants:
Case:

 Description   

Original title: Change streams with no results cannot be resumed once the oplog has rolled over

Suppose you want to open a change stream to listen for an event that happens relatively rarely, maybe once a week or so. The 'resumeAfter' protocol implemented with drivers isn't very helpful here because you will never get a resume token until that rare event happens. Thus, if there's a network error of some sort and your stream has to be resumed, you have no option but to start a new one. But starting a new one might start ahead of where the last one left off!

Starting in 4.0, the desired protocol for opening a change stream with a driver is to start watching at an operation time, any operation time from around the time the stream is opened will work for the examples here. This means that if you want to resume your stream before you see any resume token, you can just remember the time you started and start again from that time. This is going to scan some extra data, but at least you won't miss anything.

However, there's still a problem with very infrequent events. If enough time passes between opening the stream and the interesting event, the operation time you started with will no longer be present in the oplog. Supposing this is the case and there is a network error, the driver will attempt to re-open the stream and the server will return an error because we cannot start a change stream that far in the past (we would be missing events).

To fix this, we could do one of the following:

  • Augment the driver protocol to expose some of the information we use internally between mongos and mongod to show progress despite the lack of events. See SERVER-29929. Then the drivers could use this time to resume and simultaneously avoid re-scanning a ton on resume and avoid concerns about rolling off the oplog.
  • Relax the assertions in the server to allow using an operation time that's no longer in the oplog.jj


 Comments   
Comment by Jonathan Green [ 15/Nov/18 ]

Hi Jack,

*I asked the team working directly on this for a timeline, here's their
response:*
Hello! We are actively working on this one, yes. The timeline is a bit hard
to predict because this one involves first doing the changes on the server
in the master branch, then seeing if/how we can backport some to the 4.0
release, then implementing new driver APIs on top of that, and finally
releasing a new version of 4.0 and of whichever driver the customer is
using. I would say absolute best case would be a month and a half or two
months, but more likely lagging behind that - we're still in the early
phases so it's hard to anticipate whether we'll encounter complexities.

*I followed up by asking if this is an issue that you may potentially see
in development and that you would not see in production. Here's their
response:*
Right. We expect this to only be an issue if the events are far apart

Because this issue only arises when there is a long oplog between triggers,
you may not see the issue in production. *Would you consider testing under
a production-like environment?*

Best,
Jonathan

On Thu, Nov 15, 2018 at 10:28 AM, DORDONNE Jacques-olivier (Jira) <

{ name : "Jonathan Green", phone : "866-237-8815 ext.4027", location : "Austin, TX",}
Comment by DORDONNE Jacques-olivier [ 15/Nov/18 ]

Hi, I would like to know if you have an ETA for this feature.

Regards,

Comment by Eric Daniels (Inactive) [ 10/Oct/18 ]

I think from a sharded perspective we have to return the minimum of all the shards last cluster times. That will ensure that you can resume and not miss an event in the event that one shard is ahead. So for a replica set, it would use be the latest time.

Comment by Kevin Albertson [ 10/Oct/18 ]

Of the proposed solutions I vote for the first (exposing the latest optime the change stream cursor has observed) in the getMore and aggregate commands.

If we instead relax the assertions, I assume the behavior would be to resume the change stream at the node's majority committed operation time. So we could still lose an event, right?

Comment by David Dana [ 08/Oct/18 ]

rahul.singhania FYI please track.

Comment by Charlie Swanson [ 14/Sep/18 ]

Flagging for scheduling - this has come up recently with stitch triggers and seems eligible as a quick win.

Generated at Thu Feb 08 04:40:48 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.