[SERVER-60839] Introduce a TemporarilyUnavailable error type Created: 20/Oct/21 Updated: 29/Oct/23 Resolved: 16/Feb/22 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | 6.0.0-rc0, 5.0.15 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Dmitry Agranat | Assignee: | Josef Ahmad |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | RDY | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Backwards Compatibility: | Minor Change | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Backport Requested: |
v5.0
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Sprint: | Execution Team 2022-02-07, Execution Team 2022-02-21 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Case: | (copied to CRM) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Linked BF Score: | 10 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Description |
|
The TemporarilyUnavailable error indicates that the operation has been aborted, likely due to excessive server load (e.g. transaction rolled back for eviction). This error is retried in the server with an increasingly larger backoff. Internal operations are retried indefinitely, user operations are retried up to a fixed number of attempts before returning TemporarilyUnavailable to the client. |
| Comments |
| Comment by Githook User [ 18/Jan/23 ] |
|
Author: {'name': 'Josef Ahmad', 'email': 'josef.ahmad@mongodb.com', 'username': 'josefahmad'}Message: This is groundwork for further differentiating WT return codes. (cherry picked from commit f4aaa34d623e7385b2ac5b332ee07ece1f22c428) |
| Comment by Githook User [ 18/Jan/23 ] |
|
Author: {'name': 'Josef Ahmad', 'email': 'josef.ahmad@mongodb.com', 'username': 'josefahmad'}Message: (cherry picked from commit 7cfa78a4e20eb59c4d592bb12b6493c451b8dd13) |
| Comment by Yujin Kang Park [ 17/Jan/23 ] |
|
gregory.noma@mongodb.com, thanks for the suggestion. I have created |
| Comment by Yujin Kang Park [ 17/Jan/23 ] |
|
Requesting backport to 5.0, at least for the first commit in the ticket (regarding passing WT_SESSION to wtRCToStatus_slow) https://github.com/mongodb/mongo/commit/f4aaa34d623e7385b2ac5b332ee07ece1f22c428 louis.williams@mongodb.com I am assuming we don't want to backport the temporarily unavailable error. |
| Comment by Githook User [ 15/Feb/22 ] |
|
Author: {'name': 'Josef Ahmad', 'email': 'josef.ahmad@mongodb.com', 'username': 'josefahmad'}Message: Introduce a TemporarilyUnavailable error and exception type for load Errors are retried with an increasingly larger backoff. Internal operations |
| Comment by Louis Williams [ 07/Feb/22 ] |
|
kevin.jernigan, there are 2 cases to consider: Tests that use multi-document transactions handle WriteConflictExceptions as a TransientTransactionError and retry indefinitely. This is what we tell users to do, and in fact, newer drivers do this automatically for users. For non-multi-document transactions, this error is currently being retried indefinitely inside the server. The proposed behavior is to retry a finite number of times before eventually letting it escape. The problem here is that our multi-document transactions tests were designed to handle this type of error, but the rest of our tests (i.e. most of them) are not. |
| Comment by Kevin Jernigan (Inactive) [ 04/Feb/22 ] |
|
When this condition happens today, i.e. when a write operation hits the Wired Tiger dirty threshold limit, we convert to a WriteConflict. How do we handle this in our test infrastructure - don't we fail entire tests for commands that aren't retryable? If so, then what changes if we return a more specialized error for this condition - won't the same tests fail that would fail without the changes in this ticket? |
| Comment by Githook User [ 02/Feb/22 ] |
|
Author: {'name': 'Josef Ahmad', 'email': 'josef.ahmad@mongodb.com', 'username': 'josefahmad'}Message: This is groundwork for further differentiating WT return codes. |
| Comment by Githook User [ 02/Feb/22 ] |
|
Author: {'name': 'Josef Ahmad', 'email': 'josef.ahmad@mongodb.com', 'username': 'josefahmad'}Message: |
| Comment by Eric Milkie [ 14/Jan/22 ] |
|
Thanks for the clarifications; I modified the title of this ticket for better specificity. Should we close |
| Comment by Louis Williams [ 14/Jan/22 ] |
|
milkie, after discussing with keith.smith, he confirmed that there is only one scenario for a transaction being rolled-back due to pinning cache space, and that is the "oldest pinned transaction ID rolled back for eviction". The "synchronous" case you described is just a generalization of the asynchronous case. When a very large transaction pins cache space and is unable to evict pages, WiredTiger will start to roll-back transactions, starting from the oldest, until it gets to the large one. So these two cases that you described are not distinguishable from WiredTiger's perspective. |
| Comment by Eric Milkie [ 13/Jan/22 ] |
|
It sounds like this ticket is starting to overlap with |
| Comment by Louis Williams [ 13/Jan/22 ] |
|
We should consider retrying internally once or twice in the existing writeConflictRetry path before ultimately letting this error escape. Additionally, we considering labeling this error code as retryable so that drivers can retry once on their end. We won't be able to let this error escape internal threads. We can only let the error escape for user-originating operations. |
| Comment by Louis Williams [ 05/Jan/22 ] |
|
Using the work from |