[SERVER-39726] Recovering the state of an uncommitted transaction should not block Created: 21/Feb/19 Updated: 02/May/19 Resolved: 02/May/19 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | 4.1.8 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Shane Harvey | Assignee: | Randolph Tan |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||
| Sprint: | Sharding 2019-04-08, Sharding 2019-04-22, Sharding 2019-05-06 | ||||||||||||||||||||
| Participants: | |||||||||||||||||||||
| Description |
|
And the server design also says:
However the implementation of
So I think the current implementation is incomplete. The abort optimization is important because it prevents applications from blocking for 60 seconds (or transactionLifetimeLimitSeconds) when the original commit attempt is lost. CC: renctan. |
| Comments |
| Comment by Randolph Tan [ 02/May/19 ] |
|
Changes in |
| Comment by Shane Harvey [ 25/Feb/19 ] |
Yes this is the behavior I would like to see implemented by this ticket. I linked |
| Comment by Esha Maharishi (Inactive) [ 25/Feb/19 ] |
|
shane.harvey, I think the distinction is that commitTransaction against a recovery router will abort the uncommitted transaction on the recovery shard, which guarantees the transaction will never commit. This is done so that NoSuchTransaction can be safely returned to the client. However, it does not synchronously abort the transaction on all participant shards, since the recoveryToken does not include the participant list. |
| Comment by Shane Harvey [ 21/Feb/19 ] |
|
LinkingĀ |
| Comment by Shane Harvey [ 21/Feb/19 ] |
|
Interesting... can you explain why the recovery commitTransaction attempt cannot communicate with the coordinator to abort the transaction? What exactly is the race condition? |
| Comment by Randolph Tan [ 21/Feb/19 ] |
|
Note: design doc is not up to date. I don't think the abort was meant to be an optimization and it is also possible that it won't be used to get around this quirk. The issue is that making decisions without involving the transaction coordinator is racy. |