[SERVER-44610] Missing NonResumableChangeStreamError label from fatal error Created: 13/Nov/19  Updated: 06/Dec/19  Resolved: 06/Dec/19

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Divjot Arora (Inactive) Assignee: Bernard Gorman
Resolution: Duplicate Votes: 0
Labels: qexec-team
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
duplicates SERVER-44733 Explicitly return a stream-fatal erro... Closed
Operating System: ALL
Sprint: Query 2019-12-16
Participants:

 Description   

The "resume of change stream was not possible" error (location 40585, code name Location40585) is missing a "NonResumeableChangeStreamError" error label, causing drivers to try to resume even though the error is fatal.



 Comments   
Comment by Bernard Gorman [ 06/Dec/19 ]

Fixed in SERVER-44733.

Comment by Divjot Arora (Inactive) [ 04/Dec/19 ]

bernard.gorman I believe this can cause a retry loop because drivers don't currently have the logic for "if the same error is seen twice in a row, stop resuming." If a user specifies a read preference that only matches one server or makes the change stream over a direct connection, the same server will be selected for all resume attempts. We do have SPEC-1505 open to potentially add logic to stop if the same error occurs twice, so that may solve this if this isn't actually a fatal error in all cases.

Comment by Bernard Gorman [ 04/Dec/19 ]

This exception is thrown if the mongoD's oplog history does not go back as far as the resume token. I'm inclined to leave this as-is, since I can think of at least one situation in which it is beneficial to have the drivers retry upon encountering this exception: if the change stream is issued with a read preference other than primary, then on the retry the drivers may select an alternative member of the replica set which does have sufficient oplog to resume the stream. The worst-case scenario here is that the drivers retry, the retry fails for the same reason, and the drivers give up.

[Edit]: apologies, this exception is actually thrown if we have already verified that the oplog has enough history, but the resume token is still not observed in the stream. ChangeStreamFatalError does indeed seem appropriate here.

Comment by David Storch [ 22/Nov/19 ]

We should change this from using a unnamed code to using ChangeStreamFatalError.

Generated at Thu Feb 08 05:06:28 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.