[SERVER-67677] Resuming a change stream with a resume token for an event that no longer matches returns an error Created: 29/Jun/22 Updated: 31/Oct/23 |
|
| Status: | Backlog |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | New Feature | Priority: | Major - P3 |
| Reporter: | Tyler Kaye | Assignee: | Backlog - Query Execution |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Assigned Teams: |
Query Execution
|
||||||||
| Participants: | |||||||||
| Description |
|
If you open a change stream with a match expression and observe a resume token, then make the match expression more selective and open the stream again there is a possibility that the event no longer matches and you get an error for the resume token no longer being in the oplog.
This seems reasonable, but when we factor in that the FullDocument field can be variable, then this becomes a bit more problematic as the following can happen:
Bernard told me to file a SERVER ticket for this epic for this
|
| Comments |
| Comment by Eric Daniels [ 23/Nov/22 ] | ||||||||||||
|
Upon further inspection, I think this is different from what Tyler found. Will file separately! | ||||||||||||
| Comment by Eric Daniels [ 23/Nov/22 ] | ||||||||||||
|
So I just hit this when upgrading from 4.4.18 to 6.0.3. I believe that this is either a regression or an undefined behavior that was fixed but broke backwards compatibility. In my scenario, exactly as Tyler mentions, we are waiting for calls to "robots". Once we receive a call to a robot based on its name (an insert), we latch onto the call document and wait for future updates in a more select change stream. In order to prevent races, we use the resume token from the original change stream. What ends up happening is that the server thinks some resume token no longer exists. What appears to actually be happening is that in 6.0, the server actually increments the cluster time ordinal by one from the token passed in. Depending on the amount of writes to the oplog, there's a chance (especially in testing) that that token does not exist yet, resulting in an error. If a single write happens after the first insert that causes that ordinal incremented cluster time to exist, no error happens, but we miss the next update event which does not appear to make sense (and feels like a bug). Attached are two sets of reproduction scripts. Repro 1: The first script to execute, watcher.js, waits for a document to be inserted and then starts up a new change stream waiting for the next update to that inserted document using the resume token to not miss any updates. Then execute writer.js which will insert a document and then update it. You should see an error like:
If you inspect the tokens, you'll see that they are off by one and from past experience, this is likely a cluster time ordinal difference. Repro 2: Run watcher.js again but this time run writer2.js which will do an insert of another document before the update. In this case, it still errors. This also works fine on 5.0.14. Presently, this issue is preventing us from upgrading our Atlas cluster to 6.0.3. | ||||||||||||
| Comment by Bernard Gorman [ 08/Aug/22 ] | ||||||||||||
|
We can likely address this by throwing past the DSCSAddPostImage stage and any intervening match filters when we observe the token in the resumed stream, similar to what we currently do when resuming after an invalidate. This would also fix the problem of being unable to resumeAfter a document that exceeds the maximum BSON size limit, since we currently recreate the offending document while attempting to find the resume point. Note: implementing the above would lead to a situation where the ability of streams to resume from a different pipeline's token would be largely unpredictable from the user's perspective, as it would depend entirely on whether the new pipeline's user predicates can be rewritten into the oplog. To make the behaviour consistent, we would also need to add a branch to the oplog filter such that the resume event will always be returned. This approach would also address SERVER-71565. |