[SERVER-46091] Add new failpoint to support driver testing of ResumableChangeStreamError label Created: 11/Feb/20  Updated: 29/Oct/23  Resolved: 20/Feb/20

Status: Closed
Project: Core Server
Component/s: Aggregation Framework
Affects Version/s: None
Fix Version/s: 4.3.4

Type: Improvement Priority: Major - P3
Reporter: Bernard Gorman Assignee: Bernard Gorman
Resolution: Fixed Votes: 0
Labels: qexec-team
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Related
is related to SERVER-45505 Add ResumableChangeStreamError error ... Closed
Backwards Compatibility: Fully Compatible
Backport Requested:
v4.2
Sprint: Query 2020-02-24
Participants:

 Description   

Driver tests designed to exercise the ResumableChangeStreamError label introduced in SERVER-45505 will need to use a failpoint to induce the desired exceptions. However, for getMore commands, the existing generic failCommand failpoint runs too early in the command path, long before the getMore has checked out its cursor and set CurOp::_originatingCommand. The error label code therefore cannot determine that this cursor is a change stream, and will not attach the error label.

We should add a new failGetMoreAfterCursorCheckout failpoint to support these tests.



 Comments   
Comment by Githook User [ 20/Feb/20 ]

Author:

{'username': 'gormanb', 'name': 'Bernard Gorman', 'email': 'bernard.gorman@gmail.com'}

Message: SERVER-46091 Add new failpoint to support driver testing of ResumableChangeStreamError label
Branch: master
https://github.com/mongodb/mongo/commit/82fa959d2b9686a8cd553babd0381b0e0d11574d

Comment by Shane Harvey [ 12/Feb/20 ]

Okay, thank you for considering it!

Comment by Bernard Gorman [ 12/Feb/20 ]

shane.harvey: I'm inclined to add the failGetMoreAfterCursorCheckout failpoint rather than doing SERVER-39346. Firstly, because SERVER-39346 proposes a more extensive change which would take more time to implement and test, and which we would therefore have to put more thought into scheduling. Secondly, even if we did SERVER-39346 I think we would still need the failGetMoreAfterCursorCheckout for the purposes of the change stream whitelist tests. If we were to do SERVER-39346 alone, then the whitelist tests would have to use the "full execution" mode in order to test the error labels generated by getMore. But this would no longer be testing just the error label generation logic; it would instead be testing the error label logic plus the full change stream execution code. The change proposed in SERVER-39346 is geared towards testing the actual behaviour of a command (or more specifically, a sharded transaction), but this is not a requirement for the whitelist tests.

Comment by Shane Harvey [ 11/Feb/20 ]

I don't want to cause any delay but if both this ticket and SERVER-39346 are the same amount of work I would prefer the latter. I think it's at least worth looking into before we commit to a new failGetMoreAfterCursorCheckout failpoint.

Comment by Divjot Arora (Inactive) [ 11/Feb/20 ]

shane.harvey pointed out that doing SERVER-39346 would solve this issue without creating a new fail point. That is a much larger change, though, and would require more work.

bernard.gorman shane.harvey Thoughts? Do we want to potentially push this out further or go with the new fail point for now?

Generated at Thu Feb 08 05:10:28 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.