[SERVER-40186] The logic in `auto_retry_transaction.js:withTxnAndAutoRetry` does not retry failed commits Created: 18/Mar/19 Updated: 29/Oct/23 Resolved: 25/Apr/19 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | None |
| Fix Version/s: | 4.1.11 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Kaloian Manassiev | Assignee: | Jack Mulrow |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||||||
| Sprint: | Sharding 2019-03-25, Sharding 2019-04-08, Sharding 2019-04-22, Sharding 2019-05-06 | ||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||
| Linked BF Score: | 5 | ||||||||||||||||||||||||
| Description |
|
The multi_statement_transaction_kill_sessions_atomicity_isolation.js concurrency workload executes ordered updates in transactions using snapshot isolation and from time to time kills random sessions, finally validating that the transactions still committed in the correct order. Enabling this workload against a sharded cluster leads to failures which appear as if transactions committed out of order:
The reason for these failures is not due to a server bug, but because interrupting a session running 2 phase commit on mongos, may still result in the transaction committing. As a result of this, because the test retries the entire transaction (with exactly the same parameters), the transaction ends up committing twice. Proposed fixThe way to fix is would be to make withTxnAndAutoRetry retry just the commit, if it fails, similar to what the drivers spec requires, namely:
|
| Comments |
| Comment by Githook User [ 25/Apr/19 ] |
|
Author: {'email': 'jack.mulrow@mongodb.com', 'name': 'Jack Mulrow', 'username': 'jsmulrow'}Message: |
| Comment by Janna Golden [ 18/Mar/19 ] |
|
Yeah, I had run into this previously. I filed Regardless, the fix sounds correct to me. |
| Comment by Max Hirschhorn [ 18/Mar/19 ] |
|
I believe janna.golden had encountered an issue similar to this with the withTxnAndAutoRetry() function at one point, which is what gave me the thought to mention it being a possible cause to you last Friday. Retrying the commitTransaction sounds correct to me based on what the Driver's specification says. The commitTransaction() function in the user-facing version of the mongo shell doesn't have any retry logic. CC judah.schvimer who has been working on how the version of the mongo shell used for testing retries commands in the face of network and other retryable error codes. I don't believe there's an existing SERVER ticket that tracks how the mongo shell isn't compliant with the Driver's specification for transactions. |
| Comment by Kaloian Manassiev [ 18/Mar/19 ] |
|
max.hirschhorn, do you mind confirming whether the diagnosis and the proposed fix above sound right? |