Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-62340

Tenant Migration can lead to leakage of "TenantMigrationBlockerAsync" threads.

    • Type: Icon: Bug Bug
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • 5.3.0
    • Affects Version/s: None
    • Component/s: None
    • Labels:
      None
    • Fully Compatible
    • ALL
    • Server Serverless 2022-01-24
    • 15

      While investigating the BF, it revealed that the tenant migration donor code can lead to leakage of "TenantMigrationBlockerAsync" threads. Consider the below scenario:

      1) Donor starts migration for tenant foo.
      2) Donor secondary on receiving the state doc with "kAbortingIndexBuilds" state, it creates an access blocker for tenant foo and add it to the blocker registry. Now, the shared reference count on the donor access blocker will be 1.
      3) Donor secondary receives a find command for tenant foo.
      4) Find request calls checkIfCanReadOrBlock() and asynchronously waits on the canRead promise to be fulfilled and captures the tenant's donor access blocker as shared pointer. Now, the shared reference count on the donor access blocker will be 2.
      5) Donor primary durably commits the migration for tenant "foo" and marks the state document as garbage collectible.
      6) Donor secondary on receiving the donor state doc with 'expireAt' set, it would fulfill the canRead promise and remove the donor access blocker for tenant foo from the blocker registry. Now, the shared reference count on the donor access blocker will reduce to 1
      7) On canRead promise fulfillment, we run this continuation chain on
      _asyncBlockingOperationsExecutor, backed up the thread pool TenantMigrationBlockerAsyncThreadPool. Once the work in the chain is completed, it will release all the captured resources. This will lead the shared reference count on the donor access blocker to decrement by 1, i.e, number of shared owners will now be 0.

      This results in calling of TenantMigrationDonorAccessBlocker destructor, which in turn results in calling of _asyncBlockingOperationsExecutor's destructor (This thread pool executor is shared by all donor access blockers and is destroyed when no access blockers exist), that makes the executor to shutdown and waits for executor to join. But, the executor join() is blocked waiting for current "TenantMigrationBlockerAsync-X" thread to join and the current "TenantMigrationBlockerAsync-X" thread is waiting for executor _join() to complete, leading to self-deadlock and leakage of "TenantMigrationBlockerAsync" threads.

      Note: The same problem exist on the recipient side as well.

            Assignee:
            didier.nadeau@mongodb.com Didier Nadeau
            Reporter:
            suganthi.mani@mongodb.com Suganthi Mani
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: