[SERVER-61816] cancel_coordinate_txn_commit_with_tickets_exhausted.js can hang forever due to race condition between transaction reaper and transaction coordinator Created: 30/Nov/21 Updated: 29/Oct/23 Resolved: 02/Dec/21 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | None |
| Fix Version/s: | 5.2.0, 5.1.2, 5.0.6, 4.4.11, 4.2.19 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Luis Osta (Inactive) | Assignee: | Luis Osta (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | sharding-nyc-subteam1 | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||
| Backport Requested: |
v5.1, v5.0, v4.4, v4.2
|
||||||||||||||||||||
| Steps To Reproduce: |
|
||||||||||||||||||||
| Sprint: | Sharding 2021-12-13 | ||||||||||||||||||||
| Participants: | |||||||||||||||||||||
| Linked BF Score: | 163 | ||||||||||||||||||||
| Story Points: | 2 | ||||||||||||||||||||
| Description |
|
Context The issue that this occurred happens when the TransactionCoordinator is also a participant. The local transaction reaper gets triggered before the TransactionCoordinator sends the abortTransaction command to the local transaction (also due to a timeout). The coordinator sends the abort command to all of the participants, but since the coordinator is also a participant, it will utilize handleRequest to abort the local transaction. The underlying function which handles the request has special logic in the event that the coordinator is also the participant, instead of going through the network, it will directly call handleRequest. This is the origin of that stack frame above. That call to handleRequest will get stuck because the ServiceEntryPoint attempt to do a no-op write because the abortTransaction command failed with a NoSuchTransaction error. Proposal The fix required to make the test work as expected is for the transaction coordinator assert.soon accept the coordinator to be in any step equal to or past writingDecision. The new assert.soon function that checks for the server status of the transaction coordinator should look something like this:
|
| Comments |
| Comment by Githook User [ 10/Jan/22 ] |
|
Author: {'name': 'Luis Osta', 'email': 'luis.osta@mongodb.com', 'username': 'LuisOsta'}Message: (cherry picked from commit 6d1b572c7ddbba652ffa49dc3783fbd27cec9714)
(cherry picked from commit 78ab98a46b53582a5e69424bbb92f25c483fec0a) |
| Comment by Githook User [ 13/Dec/21 ] |
|
Author: {'name': 'Luis Osta', 'email': 'luis.osta@mongodb.com', 'username': 'LuisOsta'}Message: (cherry picked from commit 6d1b572c7ddbba652ffa49dc3783fbd27cec9714)
(cherry picked from commit 78ab98a46b53582a5e69424bbb92f25c483fec0a) |
| Comment by Githook User [ 08/Dec/21 ] |
|
Author: {'name': 'Luis Osta', 'email': 'luis.osta@mongodb.com', 'username': 'LuisOsta'}Message: (cherry picked from commit 6d1b572c7ddbba652ffa49dc3783fbd27cec9714)
(cherry picked from commit 78ab98a46b53582a5e69424bbb92f25c483fec0a) |
| Comment by Githook User [ 07/Dec/21 ] |
|
Author: {'name': 'Luis Osta', 'email': 'luis.osta@mongodb.com', 'username': 'LuisOsta'}Message: (cherry picked from commit 6d1b572c7ddbba652ffa49dc3783fbd27cec9714)
(cherry picked from commit 78ab98a46b53582a5e69424bbb92f25c483fec0a) |
| Comment by Githook User [ 02/Dec/21 ] |
|
Author: {'name': 'Luis Osta', 'email': 'luis.osta@mongodb.com', 'username': 'LuisOsta'}Message: |
| Comment by Githook User [ 02/Dec/21 ] |
|
Author: {'name': 'Luis Osta', 'email': 'luis.osta@mongodb.com', 'username': 'LuisOsta'}Message: |