[SERVER-45009] Transaction coordinator tasks should be robust to shutdown failing to step down Created: 06/Dec/19 Updated: 29/Oct/23 Resolved: 16/Mar/20 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | None |
| Fix Version/s: | 4.4.0-rc0, 4.2.7, 4.7.0 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Esha Maharishi (Inactive) | Assignee: | Lamont Nelson |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | sharding-4.4-stabilization, sharding-wfbf-day | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||
| Backport Requested: |
v4.4, v4.2
|
||||||||||||||||||||
| Participants: | |||||||||||||||||||||
| Linked BF Score: | 18 | ||||||||||||||||||||
| Description |
|
Since shutdown tries to join the Grid's ThreadPoolTaskExecutors before interrupting all OperationContexts, a slow coordinator task running in one of the Grid's ThreadPoolTaskExecutors can delay shutdown. In particular, this manifests as hangs in tests that inject hangs into coordinator tasks. |
| Comments |
| Comment by Githook User [ 29/Apr/20 ] |
|
Author: {'name': 'Lamont Nelson', 'email': 'lamont.nelson@mongodb.com', 'username': 'lamontnelson'}Message: (cherry picked from commit e4736e7d9e327eafe19e1281bb3942978ca3c353) |
| Comment by Esha Maharishi (Inactive) [ 14/Apr/20 ] |
|
steven.vannelli thanks, that looks right to me. |
| Comment by Githook User [ 23/Mar/20 ] |
|
Author: {'email': 'lamont.nelson@mongodb.com', 'name': 'Lamont Nelson', 'username': 'lamontnelson'}Message: |
| Comment by Githook User [ 12/Mar/20 ] |
|
Author: {'username': 'lamontnelson', 'name': 'Lamont Nelson', 'email': 'lamont.nelson@mongodb.com'}Message: |
| Comment by Lamont Nelson [ 11/Mar/20 ] |
|
Code Review: https://mongodbcr.appspot.com/562820001/#ps574550007 |
| Comment by Esha Maharishi (Inactive) [ 26/Dec/19 ] |
|
One idea is to also call TransactionCoordinatorService::onStepDown, which currently is called on stepdown, in the shutdown code's catch blocks for stepdown failing. It looks like TransactionCoordinatorService::onStepDown is already robust to being called twice because of this logic. |