[SERVER-52783] [cleanup] Make tenant_migration_donor_util::checkIfCanReadOrBlock return a Future Created: 11/Nov/20  Updated: 29/Oct/23  Resolved: 12/Jan/21

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: 4.9.0-alpha0

Type: Task Priority: Major - P3
Reporter: Esha Maharishi (Inactive) Assignee: Andrew Shuvalov (Inactive)
Resolution: Fixed Votes: 0
Labels: pm-1791_other_required
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
is depended on by SERVER-53761 Determine strategy to make the Servic... Blocked
Related
related to SERVER-53505 Refactor tenant_migration_donor_util:... Closed
Backwards Compatibility: Fully Compatible
Sprint: Sharding 2020-12-28, Sharding 2021-01-11, Sharding 2021-01-25
Participants:

 Description   

Because of the changes in https://jira.mongodb.org/browse/PM-1704, it no longer makes sense for tenant_migration_donor_util::checkIfCanReadOrBlock to be a blocking function, since it's called in a future chain.

It should be changed to instead return a Future, so that it doesn't block a thread in a thread pool.



 Comments   
Comment by Githook User [ 11/Jan/21 ]

Author:

{'name': 'Andrew Shuvalov', 'email': 'andrew.shuvalov@mongodb.com', 'username': 'shuvalov-mdb'}

Message: SERVER-52783: Make tenant_migration_donor_util::checkIfCanReadOrBlock return a Future, for now still synchronous
Branch: master
https://github.com/mongodb/mongo/commit/1fca5817faaa9067d26dcf5261d4ea85bf7507a6

Comment by Andrew Shuvalov (Inactive) [ 23/Dec/20 ]

The goal of this refactoring is to avoid blocking code on the client request processing path. The existing code is essentially a performance bug: it may lead to thread starvation when more operations are blocked in their threads, and may result to thundering herd outage if all blocked ops are unblocked at once. As the refactoring is pretty large I plan to do it in 3 steps:

  1. Eliminate the condition variable and replace it with a Future. Still, keep the logic blocked and do future->wait() to block for condition
  2. Actually convert the code to asynchronous. This looks complicated by itself, and the additional requirement is to properly transfer all thread-local code to new thread
  3. Determine boilerplate patterns and move them to some util code

About timeouts: the condition variable utils supported the operation timeout, I had to do it for the Futures from scratch. The idea is to use cancellation tokens and to schedule the cancellation at the executor. 

About thread pool: at this point an appropriate thread pool for this asynchronous processing does not exist. As discussed with amirsaman.memaripour I had to create a new task executor for this effort. Unfortunately a simpler thread pool does not work here because it has to support timer events, which only a task executor can. 

Second point about executor: when we have a global executor https://jira.mongodb.org/browse/PM-1809 it might not be enough because of thundering herd problem when all blocked operations are unblocked at once. My proposal would be to support some sort of admission control where those pending ops have a reservation of how many threads max of the global executor they can use. For now the new task executor is limited to 4 threads max, I would consider it pretty dangerous to increase this number.

Generated at Thu Feb 08 05:29:00 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.