[SERVER-45009] Transaction coordinator tasks should be robust to shutdown failing to step down Created: 06/Dec/19  Updated: 29/Oct/23  Resolved: 16/Mar/20

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: 4.4.0-rc0, 4.2.7, 4.7.0

Type: Bug Priority: Major - P3
Reporter: Esha Maharishi (Inactive) Assignee: Lamont Nelson
Resolution: Fixed Votes: 0
Labels: sharding-4.4-stabilization, sharding-wfbf-day
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Duplicate
is duplicated by SERVER-46304 txn_two_phase_commit_coordinator_shut... Closed
Related
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v4.4, v4.2
Participants:
Linked BF Score: 18

 Description   

Since shutdown tries to join the Grid's ThreadPoolTaskExecutors before interrupting all OperationContexts, a slow coordinator task running in one of the Grid's ThreadPoolTaskExecutors can delay shutdown.

In particular, this manifests as hangs in tests that inject hangs into coordinator tasks.



 Comments   
Comment by Githook User [ 29/Apr/20 ]

Author:

{'name': 'Lamont Nelson', 'email': 'lamont.nelson@mongodb.com', 'username': 'lamontnelson'}

Message: SERVER-45009: Run onStepdown on the TransactionCoordinatorService to prevent tests from hanging at shutdown

(cherry picked from commit e4736e7d9e327eafe19e1281bb3942978ca3c353)
Branch: v4.2
https://github.com/mongodb/mongo/commit/7fd3c03c548d0febfa1e871e16d638513c417c79

Comment by Esha Maharishi (Inactive) [ 14/Apr/20 ]

steven.vannelli thanks, that looks right to me.

Comment by Githook User [ 23/Mar/20 ]

Author:

{'email': 'lamont.nelson@mongodb.com', 'name': 'Lamont Nelson', 'username': 'lamontnelson'}

Message: SERVER-45009: Run onStepdown on the TransactionCoordinatorService to prevent tests from hanging at shutdown
Branch: v4.4
https://github.com/mongodb/mongo/commit/7285b580e94252d7c66782a086458dd2f6d095c6

Comment by Githook User [ 12/Mar/20 ]

Author:

{'username': 'lamontnelson', 'name': 'Lamont Nelson', 'email': 'lamont.nelson@mongodb.com'}

Message: SERVER-45009: Run onStepdown on the TransactionCoordinatorService to prevent tests from hanging at shutdown
Branch: master
https://github.com/mongodb/mongo/commit/e4736e7d9e327eafe19e1281bb3942978ca3c353

Comment by Lamont Nelson [ 11/Mar/20 ]

Code Review: https://mongodbcr.appspot.com/562820001/#ps574550007

Comment by Esha Maharishi (Inactive) [ 26/Dec/19 ]

One idea is to also call TransactionCoordinatorService::onStepDown, which currently is called on stepdown, in the shutdown code's catch blocks for stepdown failing.

It looks like TransactionCoordinatorService::onStepDown is already robust to being called twice because of this logic.

Generated at Thu Feb 08 05:07:37 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.