[SERVER-55648] Mongos doesn't return top-level batch-write error in case of shutdown Created: 30/Mar/21 Updated: 29/Oct/23 Resolved: 29/Jul/21 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | 4.2.12 |
| Fix Version/s: | 4.2.16, 4.0.28 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Tommaso Tocci | Assignee: | Luis Osta (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | sharding-wfbf-day | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||||||||||||||||||||||||||||||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||||||||||||||||||
| Backport Requested: |
v4.0
|
||||||||||||||||||||||||||||||||||||
| Steps To Reproduce: | To reproduce the error apply the provided patch (r4.2.12 - 5593fd8e33b60c7580 ) and run:
|
||||||||||||||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||||||||||||||
| Case: | (copied to CRM) | ||||||||||||||||||||||||||||||||||||
| Linked BF Score: | 85 | ||||||||||||||||||||||||||||||||||||
| Description |
|
Batch write operations could either return a top level error:
or a nested array of writeErrors:
Since our current retryable-write specs is a bit vague around the handling of the batchWrite response in case of writeErrors, drivers only implement retries for top-level errors of a batch write response and won't even look at the retry-able errors in the writeErrors array. The problem is that if a mongos gets shutted down in the middle of a batch write execution instead of returning a response with a top level error it could actually return a nested array that won't be retried by drivers. I suspect that this is the same underlying issue of |
| Comments |
| Comment by Githook User [ 22/Sep/21 ] |
|
Author: {'name': 'Luis Osta', 'email': 'luis.osta@mongodb.com', 'username': 'LuisOsta'}Message: |
| Comment by Githook User [ 20/Sep/21 ] |
|
Author: {'name': 'Max Hirschhorn', 'email': 'max.hirschhorn@mongodb.com', 'username': 'visemet'}Message: Revert " This reverts commit 211007fa4a705c02e7c373dd6fc148aa4de3a038. |
| Comment by Githook User [ 20/Sep/21 ] |
|
Author: {'name': 'Luis Osta', 'email': 'luis.osta@mongodb.com', 'username': 'LuisOsta'}Message: |
| Comment by Githook User [ 30/Jul/21 ] |
|
Author: {'name': 'jannaerin', 'email': 'golden.janna@gmail.com', 'username': 'jannaerin'}Message: |
| Comment by Githook User [ 29/Jul/21 ] |
|
Author: {'name': 'Luis Osta', 'email': 'luis.osta@mongodb.com', 'username': 'LuisOsta'}Message: |
| Comment by Oleg Pudeyev (Inactive) [ 22/Jul/21 ] |
|
luis.osta I think my comment above is incorrect. Its incorrectness was also pointed out by Jeremy in the subsequent comment. When a driver receives an error from the server, several things may happen, including 1) retrying the operation and 2) marking the server unknown. https://github.com/mongodb/specifications/pull/911 says that, when the server reports an error in writeErrors, the server MUST NOT be marked unknown. This says nothing about whether the operation would be retried by the driver. The operations should be retryable if they match the "determining retryable errors" requirements described in https://github.com/mongodb/specifications/blob/master/source/retryable-writes/retryable-writes.rst#determining-retryable-errors. I attempted to write a test at https://github.com/p-mongo/tests/blob/master/driver-retry-write-errors/test.rb which would set a fail point on a shard mongod and then write through a mongos, but this test doesn't produce any errors. When I write to the shard directly the fail point is triggered as expected. Are fail points not triggered by mongos->mongod operations or did I get the syntax wrong? |
| Comment by Jack Mulrow [ 05/Apr/21 ] |
|
As part of this ticket, we should also investigate if there are retryable codes other than shutdown errors that can be buried within writeErrors like this. |
| Comment by Jeremy Mikola [ 02/Apr/21 ] |
|
oleg.pudeyev: My understanding of mongodb/specifications#911 for DRIVERS-1376 is that it only applies to error checking as it pertains to SDAM. Although the original description of DRIVERS-1376 did talk about retryable writes, it looks like that was ultimately removed from the scope. |
| Comment by Oleg Pudeyev (Inactive) [ 01/Apr/21 ] |
|
The driver behavior was clarified in https://github.com/mongodb/specifications/pull/911 to require drivers to NOT check writeErrors when looking for retryable errors. |