|
While we're considering these changes, we should also examine how HWMs are generated on mongoD. Currently, if we hit EOF and the last-seen oplog timestamp has advanced since our last read, we set the HWM to that oplog timestamp and generate a PBRT using it. We do this because in this scenario, we know that the last entry in the oplog does not match our filter. But because HWM tokens sort before events at the same clusterTime, this means that we have effectively set the PBRT to just before the last event in the oplog, which is somewhat counter-intuitive. It also means that if the HWM were used to start a different pipeline, it could end up returning this event. This would not be incorrect behaviour; by definition, the event was not observed on the original stream, so we would not be returning a repeated event. But the semantics are subtle and slightly awkward to remember.
An alternative would be to set the HWM to one tick beyond the most recent event in the oplog. It seems unintuitive to set the HWM to a timestamp that doesn't yet exist in the oplog, but it's similar to what we do when opening a new stream. We would have to consider whether this change in behaviour would have ramifications elsewhere, e.g. when resuming a stream with a HWM of a different token version than the default.
|