[SERVER-51317] Change TransactionCoordinator to use executor from AsynchWorkScheduler Created: 02/Oct/20  Updated: 29/Oct/23  Resolved: 07/Jun/22

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: 6.1.0-rc0

Type: Bug Priority: Major - P3
Reporter: Matthew Saltz (Inactive) Assignee: Amirsaman Memaripour
Resolution: Fixed Votes: 0
Labels: sharding-nyc-subteam3, sharding-wfbf-day
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Issue split
split to SERVER-67077 TransactionCoordinator uses both the ... Open
Backwards Compatibility: Fully Compatible
Operating System: ALL
Sprint: Sharding 2021-07-12, Sharding 2021-07-26, Sharding 2021-08-09, Sharding 2021-08-23, Sharding 2022-03-07, Sharding NYC 2022-03-21, Sharding NYC 2022-04-04, Sharding NYC 2022-04-18, Sharding 2022-05-02, Sharding NYC 2022-05-16, Sharding NYC 2022-05-30, Service Arch 2022-06-13
Participants:
Story Points: 5

 Description   

I'm not sure if this actually leads to a bug or not, but since we rely on the AsyncWorkScheduler to shut down transaction coordinators, it might be a problem that we also do work on the Grid's fixed executor. We should investigate whether this is a problem and then fix it if it is.

This ticket changes TransactionCoordinator to use the executor from AsynchWorkScheduler. The above is tracked by SERVER-67077.



 Comments   
Comment by Israel Hsu [ 16/May/22 ]

The change I  made for simplification / readability does not address the potential resource leak in server shutdown. See my comment of May 10, 2022.

Comment by Githook User [ 12/May/22 ]

Author:

{'name': 'Israel Hsu', 'email': 'israel.hsu@mongodb.com', 'username': 'israelhsu'}

Message: SERVER-51317 Change TransactionCoordinator to use executor from AsynchWorkScheduler
Branch: master
https://github.com/mongodb/mongo/commit/f437493c9884751963441fdb98bb27b8f25d84ea

Comment by Israel Hsu [ 10/May/22 ]

I made a minor change so that the TransactionCoordinator() constructor uses the TaskScheduler that is part of AsyncWorkScheduler; but this does not solve the underlying problem of improper cleanup of scheduled tasks upon shutdown. 

Both AsyncWorkScheduler and the newer Futures with cancellation tokens implement the ability of having dependency chains of asynchronous work where a child tasks only run after the parent tasks complete, and where the cancellation of a parent task causes all of its children (and recursively, all of its descendents) to be cancelled. The real way to clean up this duplication of implementation is to refactor AsyncWorkScheduler to use cancellation tokens instead of "child executors". 

Since this doesn't have to do with internal transactions per se, I think this ticket needs to go to Service Arch for a proper solution.

Comment by Andrew Shuvalov (Inactive) [ 20/Aug/21 ]

I'm unassigned because it's not a trivial task and I can't find enough time for it for a while.

Comment by Andrew Shuvalov (Inactive) [ 28/May/21 ]

We spoke with matthew.saltz and decided that:
1. AsyncWorkScheduler::shutdown() is actually non blocking and will be safe to use
2. The only reason we do job on Grid.*getFixedExecutor() is because AsyncWorkScheduler is not OutOfLineExecutor and people did shortcut by using another but unrelated executor
3. Refactor AsyncWorkScheduler to be OutOfLineExecutor
4. Having cancelation tokens here is a nice cleanup but will never be scheduled for execution due to low priority

Generated at Thu Feb 08 05:25:08 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.