Loading...

XML

Word

Printable

JSON

Type: Improvement
Resolution: Fixed
Priority: Major - P3
Fix Version/s: 4.0.7, 4.1.9
Affects Version/s: None
Component/s: Aggregation Framework
Labels:
None

Backwards Compatibility:
Minor Change
Sprint:
Query 2019-01-14, Query 2019-02-11, Query 2019-02-25, Query 2019-03-11
Case:
Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

Original title: Change streams with no results cannot be resumed once the oplog has rolled over

Suppose you want to open a change stream to listen for an event that happens relatively rarely, maybe once a week or so. The 'resumeAfter' protocol implemented with drivers isn't very helpful here because you will never get a resume token until that rare event happens. Thus, if there's a network error of some sort and your stream has to be resumed, you have no option but to start a new one. But starting a new one might start ahead of where the last one left off!

Starting in 4.0, the desired protocol for opening a change stream with a driver is to start watching at an operation time, any operation time from around the time the stream is opened will work for the examples here. This means that if you want to resume your stream before you see any resume token, you can just remember the time you started and start again from that time. This is going to scan some extra data, but at least you won't miss anything.

However, there's still a problem with very infrequent events. If enough time passes between opening the stream and the interesting event, the operation time you started with will no longer be present in the oplog. Supposing this is the case and there is a network error, the driver will attempt to re-open the stream and the server will return an error because we cannot start a change stream that far in the past (we would be missing events).

To fix this, we could do one of the following:

Augment the driver protocol to expose some of the information we use internally between mongos and mongod to show progress despite the lack of events. See ~~SERVER-29929~~. Then the drivers could use this time to resume and simultaneously avoid re-scanning a ton on resume and avoid concerns about rolling off the oplog.

Relax the assertions in the server to allow using an operation time that's no longer in the oplog.jj

depends on

SERVER-38414 Upgrade/Downgrade testing for change stream high water mark

Closed

is depended on by

DRIVERS-595 Support postBatchResumeToken in change streams

Development Complete

is duplicated by

SERVER-32895 Provide method for getting 'current resume token' for a collection

Closed

SERVER-39143 Add noop-event to Change stream

Closed

related to

SERVER-102900 change streams getMore responses should return updated PBRT

Needs Scheduling

Assignee:: Bernard Gorman
Reporter:: Charlie Swanson
Participants:: Bernard Gorman, Charlie Swanson, David Dana, DORDONNE Jacques-olivier, Eric Daniels, Jonathan Green, Kevin Albertson
Votes:: 2 Vote for this issue
Watchers:: 19 Start watching this issue

Created:: Jun 22 2018 12:18:23 PM UTC
Updated:: Mar 26 2025 02:04:10 PM UTC
Resolved:: Mar 01 2019 03:06:03 PM UTC
Confidence Status Last Update:: 28/Jan/19 11:17 AM

Details

Description

Attachments

Issue Links

Forms

Activity

People

Dates