[CDRIVER-3121] Failpoint errors may not be considered a failure Created: 09/May/19 Updated: 28/Oct/23 Resolved: 13/Mar/20 |
|
| Status: | Closed |
| Project: | C Driver |
| Component/s: | libmongoc |
| Affects Version/s: | None |
| Fix Version/s: | 1.17.0-beta, 1.17.0 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Patrick Freed | Assignee: | Kevin Albertson |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||
| Description |
|
Given the following code sample:
it prints
when encountering a write concern error. This means the function returned true, even though the docs state that a write concern error should be considered a failure. I believe this applies to the other crud functions as well (including bulk writes). |
| Comments |
| Comment by Kevin Albertson [ 13/Mar/20 ] | ||
|
The error codes should now accept double or int64 as well.
I couldn't reproduce this, and reading the code, it does appear that write commands would retry before parsing a reply into a write result. I wonder if this was a surfacing of the same issue. Perhaps the failpoint triggered on the retry as well, and though the second reply has a writeConcernError, the return still indicated success. I'm closing for now, but feel free to re-open if that isn't the case. | ||
| Comment by Githook User [ 13/Mar/20 ] | ||
|
Author: {'username': 'kevinAlbs', 'name': 'Kevin Albertson', 'email': 'kevin.albertson@mongodb.com'}Message: Permit double and int64, since they may appear when | ||
| Comment by Patrick Freed [ 13/Aug/19 ] | ||
|
Regarding your second open question, it feels like the server's responsibility to standardize the types of their error codes, regardless of what the user sends them. I guess since failPoints are largely developer tools, they don't do a ton of sanitation right now, but I still find it very strange that it's even possible for the codes to be returned as multiple different types. I think the ok field might suffer from this same issue. | ||
| Comment by Patrick Freed [ 13/Aug/19 ] | ||
|
Blast from the past here–I glossed over the questions directed at me. Yep, I was using a full non-mocked mongod instance when I encountered this bug. For some reason, the server happens to include the exact error code as specified in the configureFailPoint command in its reply when the failPoint is triggered, preserving its type. The fail point does trigger correctly regardless of the that type (assuming the same numeric value), however. e.g. sending this command via the shell
when triggered, will return something like
By default, I believe the shell interprets numbers as Doubles, so just a vanilla untyped configureFailPoint shell command will have the code returned as a double. Specifying the code as NumberInt(..) is currently the only way to have libmongoc parse it correctly. | ||
| Comment by Kevin Albertson [ 24/Jun/19 ] | ||
|
Thanks for investigating this. You're correct in that the code in _parse_error_reply is limiting the error code check to an int32 reply. It seems we've been doing this as far back as 1.3: We don't catch this in retryable writes tests because it seems that using a failpoint to mock the error server reply always returns the error codes as int32 as well. So how did you encounter this bug? Did you see this on a live mongod returning the error code as a double? I have a couple other open questions that I should answer before completing this ticket. 1. If we do should check for any numeric error code type, does this warrant a spec change in retryable writes, and other specs that mention error codes, to mention this? | ||
| Comment by Patrick Freed [ 09/May/19 ] | ||
|
The cause of this is in mongo-rpc.c around line 1112, in _parse_error_reply. When parsing write concern errors, this function expects the error code to be an int32. When encountering non-int32 s, it ignores the code and reports no error. In the above example (where the error is caused by setting a failpoint in the shell), the code is sent back from the server as a double, so we see the bad behavior. Likewise, specifying the failpoint code using NumberLong(91) also causes the write concern error to not be reported. Configuring with NumberInt(91) produces the desired behavior, however. | ||
| Comment by Patrick Freed [ 09/May/19 ] | ||
|
It also appears that the write concern error will be present in the reply even if the write is retried and succeeds on the second attempt. The function will report success via the return value (not the reply) regardless of whether the write was retried successfully or not, though. |