Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-41217

Potential deadlock between ShardRegistry and LSC refresh

    • Type: Icon: Bug Bug
    • Resolution: Works as Designed
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 4.0.9
    • Component/s: Sharding
    • Sharding
    • ALL
    • Sharding 2019-07-01, Sharding 2019-07-15, Sharding 2019-09-09

      ShardRegistry::reload() on a config server waits for majority read on a local shard. If it coincides with the LogicalSessionsCache::refresh() which performs batch writes it may end up in the deadlock while calling ShardRegistry::getShard() while refreshing collectionRoutingInfo which can join the reload().
      The related stack traces are in the BF-12772

      Suggested Fix

      I propose to check in ReplicationCoordinatorImpl::waitUntilOpTimeForRead if secondaries are up or down. It should behave similarly to the case when _isShutdown flag is set.

            Assignee:
            backlog-server-sharding [DO NOT USE] Backlog - Sharding Team
            Reporter:
            misha.tyulenev@mongodb.com Misha Tyulenev (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated:
              Resolved: