[SERVER-41196] Mongos invariant failure crash with change streams startAfter Created: 16/May/19 Updated: 29/Oct/23 Resolved: 29/Jun/19 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | 4.1.11 |
| Fix Version/s: | 4.2.0-rc3, 4.3.1 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Shane Harvey | Assignee: | Bernard Gorman |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | query-44-grooming | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||||||
| Issue Links: |
|
||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||
| Operating System: | ALL | ||||||||||||
| Backport Requested: |
v4.2, v4.0
|
||||||||||||
| Sprint: | Query 2019-06-17, Query 2019-07-01, Query 2019-07-15 | ||||||||||||
| Participants: | |||||||||||||
| Description |
|
Mongos crashes with this invariant failure:
Attached is mongos-python-1720-crash.log The reproduction is fairly simple:
In code:
I'm using version:
|
| Comments |
| Comment by Githook User [ 29/Jun/19 ] | ||||||||||||||||||||||||||||||||||||||||||||
|
Author: {'name': 'Bernard Gorman', 'email': 'bernard.gorman@gmail.com', 'username': 'gormanb'}Message: (cherry picked from commit 7e4423b458fcefd37a62ebecf168716166b7dc4c) | ||||||||||||||||||||||||||||||||||||||||||||
| Comment by Githook User [ 29/Jun/19 ] | ||||||||||||||||||||||||||||||||||||||||||||
|
Author: {'name': 'Bernard Gorman', 'email': 'bernard.gorman@gmail.com', 'username': 'gormanb'}Message: | ||||||||||||||||||||||||||||||||||||||||||||
| Comment by Bernard Gorman [ 13/Jun/19 ] | ||||||||||||||||||||||||||||||||||||||||||||
|
justin.seyster: Approach (2) looks good! Approach (1) won't work, both because the ChangeStreamProxy does not exist on mongoS, and because mongoS does not have any concept of a postBatchResumeToken separate from what it receives from the shards. The AsyncResultsMerger on mongoS simply reads the PBRT returned from each shard and then returns the lowest of these to the client. For the initial batch, we therefore must return the exact, unmodified resume token as our first PBRT, otherwise we may hit the issue described in my previous comment. | ||||||||||||||||||||||||||||||||||||||||||||
| Comment by Justin Seyster [ 11/Jun/19 ] | ||||||||||||||||||||||||||||||||||||||||||||
|
bernard.gorman Thanks for that explanation! I have two new proposals 1: I can tweak the proposal from my previous comment so that it only strips the "fromInvalidate" from the resume token on the shards (not on the mongos). The patch would look the same, except the first if statement would be:
2: I worked out how to implement the solution you proposed:
I'm interested in your thoughts on both of these. I'll also spend some more time validating each approach. I already have a jstest that is ready for review once we're happy with a fix. | ||||||||||||||||||||||||||||||||||||||||||||
| Comment by Bernard Gorman [ 06/Jun/19 ] | ||||||||||||||||||||||||||||||||||||||||||||
|
justin.seyster: I don't think that will work, because the PBRT used to seed ChangeStreamProxy is always returned to the client as the first PBRT in the resumed stream. We'd end up returning a PBRT which was earlier than the resume token the client used as the startAfter argument. So the following could happen:
| ||||||||||||||||||||||||||||||||||||||||||||
| Comment by Justin Seyster [ 06/Jun/19 ] | ||||||||||||||||||||||||||||||||||||||||||||
|
bernard.gorman Thanks for this extremely comprehensive explanation! I want to get your thoughts on a suggestion. What do you think about modifying the ChangeStreamProxyStage constructor so that when it gets initialized with a "fromInvalidate" resume token, we strip the "fromInvalidate" flag, effectively setting the shard's first PBRT to Rd instead of Ri (in step 4 above). I think that your step 8 approach makes the same amount of sense logically, but I'm concerned that the implementation will need to either parse the ResumeTokenData in each and every ChangeStreamProxyStage::getNextBson() call or maintain some additional state in the ChangeStreamProxyStage object to keep track of when we are initializing from a "fromInvalidate" resume token. Here's the main idea of what I'm proposing:
This fixed the problem in my jstest, but I haven't sat down to think through any possible wrinkles yet. | ||||||||||||||||||||||||||||||||||||||||||||
| Comment by Bernard Gorman [ 03/Jun/19 ] | ||||||||||||||||||||||||||||||||||||||||||||
|
justin.seyster, I think I know what's going on here. TLDR: if the drop event is the last entry in the oplog at the moment we startAfter the invalidate's resume token, then we will return a postBatchResumeToken to mongoS which precedes the client-supplied resume token, causing this invariant. In All fromInvalidate events are pseudo-events that are generated following a collection drop, rename, etc. We create them by returning the drop event and queueing up an invalidate to be returned immediately after. From the linked comment, note that the resume token of the invalidate event is almost identical to the drop which generated it:
So what's happening here is:
As mentioned above, we unfortunately cannot take the simple approach of adding a fromInvalidate check in compareAgainstClientResumeToken, because this omission is intentional. We never generate a new invalidate in a stream resumed from an invalidate using startAfter, because doing so would cause the stream to be re-invalidated immediately. The exact resume token supplied by the client therefore never actually appears in the resumed stream. Since we have a strict requirement that the resumed stream should never progress past the resume token without seeing it, a strict match on the fromInvalidate field would prevent us from being able to startAfter at all. We therefore allow an invalidate resume token to match the resume token of the drop which generated it. We also can't easily modify ChangeStreamProxy's initial resume token to remove the fromInvalidate if it is present, because then the first batch returned to the client would have a postBatchResumeToken that is earlier than the one it originally handed to mongoS. One solution would be to have ChangeStreamProxy retain the parsed ResumeTokenData from the initial PBRT if fromInvalidate is true. When we get the drop event, we can then confirm that it is identical to the initial PBRT other than the fromInvalidate field, and leave the PBRT unmodified (i.e. skip step 8 above). |