[SERVER-66870] Improvements to resume token format Created: 31/May/22  Updated: 25/Sep/23

Status: Backlog
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Bernard Gorman Assignee: Backlog - Query Execution
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Assigned Teams:
Query Execution
Participants:

 Description   

There are a number of small improvements we could make to the resume token format, although these may require a bump in the version number. For instance:

  • Never explicitly encode any of the redundant info beyond tokenType for HWM tokens.
  • Encode tnxOpIndex as explicitly null for non-txn events. This will generally ensure that txn events from the same shard are contiguous in the stream; currently, the first entry in a txn will interleave with any events from other shards at the same clusterTime.
  • Consider hashing the eventIdentifier as SHA256 or similar. This would limit the resume token to a fixed size, but we would lose some potentially valuable diagnostic information in the process.


 Comments   
Comment by Bernard Gorman [ 10/Jul/23 ]

While we're considering these changes, we should also examine how HWMs are generated on mongoD. Currently, if we hit EOF and the last-seen oplog timestamp has advanced since our last read, we set the HWM to that oplog timestamp and generate a PBRT using it. We do this because in this scenario, we know that the last entry in the oplog does not match our filter. But because HWM tokens sort before events at the same clusterTime, this means that we have effectively set the PBRT to just before the last event in the oplog, which is somewhat counter-intuitive. It also means that if the HWM were used to start a different pipeline, it could end up returning this event. This would not be incorrect behaviour; by definition, the event was not observed on the original stream, so we would not be returning a repeated event. But the semantics are subtle and slightly awkward to remember.

An alternative would be to set the HWM to one tick beyond the most recent event in the oplog. It seems unintuitive to set the HWM to a timestamp that doesn't yet exist in the oplog, but it's similar to what we do when opening a new stream. We would have to consider whether this change in behaviour would have ramifications elsewhere, e.g. when resuming a stream with a HWM of a different token version than the default.

Generated at Thu Feb 08 06:06:39 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.