Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-35740

Report high-water-mark resume token with each (possibly empty) change stream batch

    • Type: Icon: Improvement Improvement
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • 4.0.7, 4.1.9
    • Affects Version/s: None
    • Component/s: Aggregation Framework
    • Labels:
    • Minor Change
    • Query 2019-01-14, Query 2019-02-11, Query 2019-02-25, Query 2019-03-11

      Original title: Change streams with no results cannot be resumed once the oplog has rolled over

      Suppose you want to open a change stream to listen for an event that happens relatively rarely, maybe once a week or so. The 'resumeAfter' protocol implemented with drivers isn't very helpful here because you will never get a resume token until that rare event happens. Thus, if there's a network error of some sort and your stream has to be resumed, you have no option but to start a new one. But starting a new one might start ahead of where the last one left off!

      Starting in 4.0, the desired protocol for opening a change stream with a driver is to start watching at an operation time, any operation time from around the time the stream is opened will work for the examples here. This means that if you want to resume your stream before you see any resume token, you can just remember the time you started and start again from that time. This is going to scan some extra data, but at least you won't miss anything.

      However, there's still a problem with very infrequent events. If enough time passes between opening the stream and the interesting event, the operation time you started with will no longer be present in the oplog. Supposing this is the case and there is a network error, the driver will attempt to re-open the stream and the server will return an error because we cannot start a change stream that far in the past (we would be missing events).

      To fix this, we could do one of the following:

      • Augment the driver protocol to expose some of the information we use internally between mongos and mongod to show progress despite the lack of events. See SERVER-29929. Then the drivers could use this time to resume and simultaneously avoid re-scanning a ton on resume and avoid concerns about rolling off the oplog.
      • Relax the assertions in the server to allow using an operation time that's no longer in the oplog.jj

            bernard.gorman@mongodb.com Bernard Gorman
            charlie.swanson@mongodb.com Charlie Swanson
            2 Vote for this issue
            19 Start watching this issue