[SERVER-38847] Suites using txn_override.js will not retry transient transaction errors without loading auto_retry_on_network_error.js Created: 04/Jan/19 Updated: 06/Dec/22 Resolved: 04/Jan/19 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Jack Mulrow | Assignee: | [DO NOT USE] Backlog - Sharding Team |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | ShardedTxn:Testing | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Assigned Teams: |
Sharding
|
||||||||
| Operating System: | ALL | ||||||||
| Participants: | |||||||||
| Description |
|
txn_override.js wraps test statements in multi document transactions and contains logic to retry entire transactions if any fail with transient errors. The override doesn't catch transient errors for commands other than commitTransaction though, relying on auto_retry_on_network_error.js to do so instead. Several suites have been added that use txn_override.js but don't expect network errors so they don't also include auto_retry_on_network_error.js, which means they currently don't retry on transient errors. This is especially important for suites exercising cross-shard transactions, which are particularly prone to transient snapshot errors. A quick fix would be to include the auto_retry_on_network_error.js override in every suite that uses txn_override.js, but a more complete approach might be to catch and retry transient transaction errors in txn_override.js itself so it doesn't rely on a seemingly unrelated override. |
| Comments |
| Comment by Jack Mulrow [ 04/Jan/19 ] |
|
It turns out loading auto_retry_on_network_error.js would force suites without network errors to blacklist all tests that use non-retryable commands because the override rejects them, so I'm closing this in favor of |