Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-89893

Change executor used by _flushReshardingStateChange from arbitrary to fixed

    • Catalog and Routing
    • Fully Compatible
    • ALL
    • v8.0, v7.3, v7.0, v6.0, v5.0
    • CAR Team 2024-04-29, CAR Team 2024-05-13

      The _flushReshardingStateChange command uses the arbitraryExecutor in order to flush the metadata by calling onCollectionVersionPlacementMismatch.

      This method may block by either:

      This is a problem because the arbitrary executor does not expect any blocking methods to be called in there. This is because it is essentially always the same one in sharding, which is one that piggybacks on the Networking ASIO threads. This means that it will be a thread pool with a fixed size and prone to deadlocks if mishandled.

      In this situation there is a potential deadlock waiting to happen given this scenario:

      • Operation A is in a multi-document transaction against collection C which uses an AsyncResultsMerger.
      • Operation B starts a resharding operation on C and enqueues a MODE_S lock since it cannot acquire it.
      • Operation B triggers an eventual execution (Operation C) of the _flushReshardingStateChange command that blocks until the critical section is signalled by resharding.
      • Operation A now attempts to kill the AsyncResultsMerger, this waits until all callbacks are executed to avoid invalid memory accesses. These callbacks are executed on the arbitraryExecutor.

      At this point Operation A is waiting until Operation C finishes which waits on Operation B and in turn waits on Operation A again.

            Assignee:
            jordi.olivares-provencio@mongodb.com Jordi Olivares Provencio
            Reporter:
            jordi.olivares-provencio@mongodb.com Jordi Olivares Provencio
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: